Since the launch of the Journal Impact Factor half a century ago in 1975, the bibliometric assessment of research outputs has proven a lasting and important — if flawed — method of quantifying impact. When considering new or non-traditional output forms, trying to ascertain their use through mentions in the literature has been a challenging endeavor. While today there are tools to facilitate this process at scale, applying these remains a significant challenge. The work of the French Open Science Monitor Initiative (FOSM) is illustrative of both the work necessary and could point the way to improving the situation across the literature.
During the Fiesole Retreat last month, Laetitia Bracco, Head of Research Data and Bibliometrics Support Services at the Université de Lorraine, presented on the work of the FOSM and some of the challenges they have faced gathering data on research data sharing. Intrigued by the presentation and its implications, I approached Ms. Bracco to discuss in more detail what the FOSM was doing and what its broader implications were. Following her talk at the conference, Ms. Bracco, her colleague Eric Jeangirard, Data Scientist, French Ministry of Higher Education and Research, and I discussed the FOSM and the work it has been doing. Having spent a great deal of time on the topics of assessment and data citation questions, I am keen to see how the publishing community can provide greater support for reference of non-traditional research outputs in the scholarly literature. Here are some of the exchanges that resulted from that conversation, with responses to my questions from Ms. Bracco and Mr. Jeangirard.
Can you tell us a bit about the background of the FOSM?
Since 2021, the French Ministry of Higher Education and Research, the Université de Lorraine and Inria have been working together to create a French Open Science Monitor (FOSM) dedicated to research data and software. The FOSM is dedicated to scientific publications and was developed by the French Ministry. During this project, many exchanges have taken place between the French team and other foreign institutions whose goals were similar: to capture the variety of research outputs and measure their openness.
These numerous exchanges led the French team to this conclusion: more and more countries, organizations and institutions are monitoring Open Science, but with no common guidelines. This was the starting point of the Principles of Open Science Monitoring, firstly drafted in France, then presented to a panel of experts during an international workshop at Unesco in December 2023 and finally discussed via an international consultation which involved more than 170 people in 40 countries (to learn more, visit the OSM website.
Beyond those Principles, the need to discuss, co-create, develop Open Science monitoring frameworks at the international level became clear. This is how the Open Science Monitoring Initiative (OSMI) was created, firstly initiated by the French Ministry of Higher Education and Research, Université de Lorraine, Inria, PLOS, SPARC Europe, UNESCO, and Charité Universitaetsmedizin Berlin, but now involving nearly 180 people in 46 countries. Inside OSMI, four working groups have been launched recently, amongst which WG4 Shared resources and infrastructure to analyze scholarly outputs.
What are the goals of the OSMI generally?
The Open Science Monitoring Initiative brings together institutions and individuals involved in monitoring open science. OSMI aims to encourage the adoption of open science monitoring principles and to promote their practical implementation. Its goals are to promote the worldwide adoption of Open Science Principles, provide recommendations for technical specifications for their implementation, and support stakeholders on various levels in monitoring Open Science practices. In addition to the Principles, whose final version will be released in July 2025, OSMI has four working groups.
At the Fiesole meeting, you described an interesting project to determine references to research data sets. How are you undertaking this effort currently to extract impact information from PDFs?
Within the FOSM project, we are indeed analyzing the full-text of French scientific publications to identify mentions to datasets and software. The objective is to compute indicators about research data sharing and research software sharing.The ultimate goal is to steer the sharing practices of datasets and software, which is a part of the French national open science plan. More specifically, AI algorithms are used to detect all mentions of data and software in a full-text. For each mention detected, the algorithm also evaluates whether it is a mention of use, creation or sharing. Finally, global indicators are calculated from these data.

What problems have you faced?
Our first problem has been to get access to the full-texts. Indeed, for publications which are still under closed access, it was necessary first to describe the project to the publishers in order for them to let us download the publications. We are authorized to undertake this kind of operation thanks to the European text and data mining legal framework. But not all publishers comply easily with the legislation. For instance, for Elsevier, it took us more than six months to get access to what we were entitled to, and now that access issues have been addressed, the system works as intended.
Calculation costs are also an important factor. The AI algorithms used require considerable resources when they are used on a large scale, such as on around one million (French) publications. This remains a limiting factor when it comes to scaling up to a global scale, especially as all the calculations have to be done again with each new version of the detection models. The algorithms used are, of course, not perfect, with variabilities in performance owing to the scientific domain. They are also not very effective in non-English languages. Recent advances in AI, in particular LLMs, have improved performance, but there certainly could be areas for further advances.
What can researchers and publishers do to improve this situation?
Researchers can pay extra attention to making the proper connections between their research outputs. For instance, when they are submitting a paper for publication, they should always link it to the related dataset and/or software, whenever relevant. But the role of the publishers and data repositories is extremely important too. For example they can make this linking process easier through dedicated fields when submitting a paper or a dataset. The use of dedicated PIDs for datasets, but also for software (SWHIDs in particular) remains underutilized, with a very variable rate of adoption depending on the discipline. Generally speaking, these topics are also going to be explored by OSMI through WG3: Open science monitoring with scholarly content providers.
Can you describe the potential value that publishers and researchers might derive from this work?
For researchers, the potential benefits are huge: gaining more recognition for their work and especially for research outputs which are not publications. Sharing datasets is time-consuming and not rewarding enough for now. Making datasets more visible is also a way to get more citations. As shown above, the rate of data and software citation is increasing, but slowly. For publishers, facilitating the process of citing datasets and software will only bring more traffic to the articles and therefore more notoriety to the journal. What’s more, with the efforts of initiatives such as CoARA, to ensure that these research products are better perceived in the evaluation process, or Make Data Count, to better quantify the use and re-use of research data, journals that are technically ready to make use of these metrics will be all the more attractive to researchers. Finally, for society as a whole, the transparency and publicity of research results, particularly in the field of health, is a collective responsibility. In France, a recent interministerial report puts forward recommendations to improve the situation in the health sector.
We have done a lot of groundwork to create the infrastructure to support data and software citation practices? How can we motivate adoption? Do you see France or the EU (or other funders) mandating or regulating these kinds of practices, since they need it for assessment purposes?
In France, the national ecosystem for research data is called Recherche Data Gouv. Launched in 2022, it provides to researchers a generalist data repository and a network of research data management support services, deployed throughout the country. Sharing research data is thus strongly encouraged and facilitated by the implementation of this infrastructure. It is also recommended that software be deposited in HAL (our national open archive for publications) and in Software Heritage. Funding bodies such as the Agence Nationale de la Recherche (French National Research Agency) require funded projects to share not only publications but also data and software whenever possible. This is also included in many roadmaps and recommendations from higher education and research establishments, such as my own university, the Université de Lorraine. These incentives have already borne fruit, with an increase in the number of datasets shared over the years. However this will continue only if these citations and impact are genuinely taken into account in the individual assessment of researchers can the situation change radically.
I note the newly launched OSMI working group, “Shared resources and infrastructure to analyze scholarly outputs”. What do you hope to achieve via this group? What can the community do to support this work?
This working group is co-chaired by Pragya Chaube (Open Science South Asia Network and CODATA Connect) and Cristina Huidiu (Wageningen University and Research). It aims to develop a collective framework for extracting and sharing essential metadata from scientific publications, such as author affiliation, data use, software sharing, funding, etc. Its main objectives are to foster the use of state-of-the-art open source tools for extracting information from full text; encourage the sharing of the extracted metadata within a framework consistent with full-text access licenses and meet the considerable computing and storage requirements via resource pooling mechanisms. Building open and trusted AI tools for full-text analysis is a major challenge. The community is more than welcome to join the working group, which has just started, and share ideas to fulfill these goals. Those who are interested in engaging are welcome to reach out.
Later this summer, the OSMI will be hosting a two-day meeting in Paris — and virtually — on July 7-8, 2025. Everyone is encouraged to continue to follow the work of the OSMI and its working groups. If scholarly communications is going to extend beyond traditional content distribution models, such as articles, there will need to be greater recognition of researchers work to motivate data sharing. OSMI and FOSM are both driving recognition for open data sharing. They are also highlighting the challenges that currently exist in processing these assessment metrics.