Access to and Reuse of Research Software
Position paper of the Helmholtz Association Open Science Working Group, March 2017
As the digitalization of research and teaching progresses, the number of software solutions used to process research data at scientific institutions is increasing. The term “research software” refers in this context to program code that is developed and/or reused in the context of a science-related activity. The verifiability and reproducibility of scientific results called for under the heading “open science”1 can be ensured in many fields only if, in addition to the research data, documented and transparent program code is also made openly accessible. Thus, in addition to open access to publications (open access) and open access to research data (open research data), open access to and reuse of research software (open research software) are an essential element of open science. The making available of research data has been successfully realized with new methods in recent years. In the area of research software, by contrast, this development is still in its infancy. The debate on the modern management of research software has gained considerably in relevance. Contributions in the journals Nature2 and Science,3 as well as diverse transdisciplinary initiatives,4 discipline-specific approaches,5 and emerging publication strategies,6 demonstrate the necessity to change existing procedures. Funding organizations, such as the European Commission within the framework of the Open Research Data Pilot under HORIZON 2020, request, in addition to open access to research data generated by funded projects, that the software used by these projects also be made publicly available.7 Moreover, research software is increasingly being taken into account in the area of reporting.8
2. Debate in the Helmholtz Association
In April 2016, the Helmholtz Association Open Science Working Group set up a task group to deal with this topic. In November 2016, this task group organized a two-day Helmholtz Open Science Workshop entitled “Access to and Reuse of Research Software,”9 in which representatives from the areas of research, information infrastructure, and administration at scientific institutions in Germany took part. Besides keynotes and short presentations, problems, challenges, solutions, and recommendations for action in the following topic areas were discussed at the workshop in thematically focused sessions:
- Technical infrastructures
- Standards and quality assurance
- Licensing and other legal aspects
- Citation and recognition
- Visibility and modularity
- Business models
- Personnel, training, career paths
With the workshop, the Helmholtz Association provided an impetus for national exchange and the networking of actors for joint activities in these topic areas. The results of the workshop have been summarized in a report.10 To promote further dialogue, a public mailing list was set up.11
3. Debate in the Alliance of Science Organizations in Germany
To promote debate on this topic in Germany, and to ensure that the science organizations develop a coordinated position on this field of action that is anchored in international developments, the ad hoc Research Software Working Group was established at the suggestion of the Helmholtz Association within the framework of the priority initiative Digital Information of the Alliance of Science Organizations in Germany. The task fields of the working group are as follows:
- Coordination and trans-organizational dialogue
- Support and promotion of the debate on the topic in Germany
- Support for initiatives for open source software solutions that have great potential for the
- scientific community
- Formulation of recommendations for good scientific practice in research software
- Networking with international initiatives
The Helmholtz-specific points set out in this position paper, as well as more general analyses and recommendations in the report of the Helmholtz Open Science Workshop “Access to and Reuse of Research Software,” can be incorporated here as a working basis.
In the following sections of this paper – using as a basis the topics “guidelines and policies,” “incentives,” “publication strategies,” “infrastructures,” “training and continuing professional development,” and “legal issues” – possible solutions will be presented and recommendations given for research software management at the Helmholtz Centers.
4.1 Guidelines and Policies
Key question: How can the topic be adequately integrated into policies of scientific institutions, funding organizations, and journals? Sustainable research software management calls for clear rules. Guidelines and policies help to ensure the quality, accessibility, and reuse of research software; to clarify open questions when exploiting it;to ensure that standardized licenses are used; and to guarantee the application of citation standards. The development of such guidelines and policies with the relevant actors contributes to creating a shared understanding of the software management at scientific institutions. Such a process also helps researchers to comply with funders’ increasing requests for accessibility of research software. In addition, it helps to ease the burden on individual researchers – for example, by making templates for software management plans available. The Helmholtz Centers are recommended to jointly develop guidelines and policies on research software management with relevant actors from research, information infrastructure, knowledge and technology transfer, and legal departments. These guidelines and policies should cover the entire life cycle of research software and include, in particular, the following areas: software development and documentation practice; licensing, making available, and archiving code; publication and transfer strategies; citation modes; research infrastructure; recommendations for assuring the quality of the software.
Key questions: How can the development and use of open source research software be encouraged? How can researchers be supported in making research software available? How can the topic be anchored in the area of reporting and in the reputation system? What citation procedures can be developed? In the science system, the development and documentation of research software is only rarely recognized as a research achievement. Hence, there are no incentives for making software available.
In addition to textual articles and research data publications, the publication of program code should be recognized as a stand-alone product of the research process. To this end, it is of great importance that publication strategies and citation standards be established for research software. The work of software developers in research should be measured for evaluations and should be recognized.
The Helmholtz Centers are recommended to recognize as a research achievement the intellectual effort that goes into developing research software by acknowledging software in publication databases as a stand-alone publication type and by measuring it by means of adequate and transparent metrics.
4.3 Publication Strategies
Key question: How can successful cooperation between software repositories, journals, and other information infrastructures be ensured? The development of research software is an important part of the research performance in digitalized science. Current research software management often does not do justice to the important role that software plays for the transparency and reproducibility of research results. Only rarely is software recognized as a stand-alone publication type. As a result, the programming achievement of the researchers is not recognized, nor do fixed procedures exist for assessing the software for exploitability.
The Helmholtz Centers are recommended to develop publication strategies for research software. Defined workflows and clear responsibilities can ensure that research software is published open source in a trustworthy infrastructure if there are no exploitation options that preclude this.
It should be ensured that software is permanently citable and accessible (e.g., by means of persistent identifiers such as Digital Object Identifiers – DOIs) and that the quality needed for its transparency, reproducibility and reusability is guaranteed by means of suitable review procedures. In order to maintain the competitive advantage of the Helmholtz Centers at which the software is developed, the imposition of an appropriate embargo period when publishing the software is conceivable.
In discussions with publishers and journals, efforts should be made to ensure that the software is cited appropriately, so that, for example, the correct referencing of software versions is guaranteed, and the deployment of the software can be traced across versions.
Key question: What repositories are suitable for permanently storing research software and actively making it available? Infrastructures such as collaborative platforms for software development or repositories for the long-term storage of software solutions form the technological and organizational foundations for access to and reuse of research software, as well as for its sustainability. The use of such infrastructures ensures that – irrespective of staff turnover – program code and documentation are stored on a long-term basis at a scientific institution or within a scientific community.
In practice, freely available services of commercial providers are used to develop software. This may lead to dependencies, which can have a negative impact on researchin the long term. Moreover, in the area of repositories, there is a lack of reliable and trustworthy infrastructures for the long-term storage of program code that enable the publication of software as a citable product of the scientific knowledge process.
The Helmholtz Centers are recommended to address these challenges and to develop strategies that ensure that research software is developed and stored on a long-term basis in trustworthy infrastructures, for example, repositories. Furthermore, platforms for collaborative software development should be recommended or provided that ensure that the Centers retain autonomy over software developments and do not become dependent on commercial providers. Here, cooperation with other organizations is also particularly desirable.
When Centers provide infrastructures of their own, attention should be paid to a sustainable personnel policy that ensures that the know-how of the technical staff for operating the infrastructures and supporting researchers is retained at the Centers in the long term.
4.5 Training and Continuing Professional Development
Key questions: How can software development in research be professionalized? What training and continuing professional development (CPD) procedures are required? In the digital age, excellent science requires very well-trained software developers. Whereas in industry, the best minds are wooed with recruitment campaigns, high salaries, and human resources development measures, scientific institutions too rarely offer software developers reliable career paths. This calls for countermeasures. Scientific institutions should offer specialist personnel long-term prospects.
Clear job formats should be created, and software development in research should be professionalized. For this purpose, it is also necessary to anchor programming skills in the initial training of specialized scientists as early as possible, thereby also strengthening their ability to communicate with software developers.
The Centers are recommended to address this area of activity in dialogue with higher education institution partners, and to establish training and career paths for software developers at the Centers. Moreover, internal training and CPD measures should be strengthened in order to promote the skills development of excellent software developers.
4.6 Legal Issues
Key questions: What licenses ensure authorship and reuse of software in a legally secure framework? Before publishing software, the program code must be assessed for dependencies and exploitability. If the aim is to publish the software open source, adequate licensing of the program code is necessary. In the world of science, knowledge about suitable licenses for software and about the implications of these licenses is often lacking. These uncertainties must be resolved. The information infrastructure and the administration should support researchers in licensing software by providing recommendations, information on best practices, and clearly defined processes. The Centers are recommended to establish processes that ensure that software intended for publication is published in a legally secure way under standardized licenses.
1Helmholtz-Gemeinschaft (2015). Open Science – Chancen, Herausforderungen und Handlungsfelder. www.helmholtz.de/open-science-subsite/open-science-in-der-helmholtz-gemeinschaft/stakeholder-und-ihre-rollen/selbstverstaendnis-des-arbeitskreises-open-science-der-helmholtz-gemeinschaft/
2See, for example: Barnes, N. (2010). Publish your computer code: it is good enough. Nature, 467(7317), 753–753. doi.org/10.1038/467753a und Ince, D. C., Hatton, L., & Graham-Cumming, J. (2012). The case for open computer programs. Nature, 482(7386), 485–488. doi.org/10.1038/nature10836
3See, for example: Morin, A., Urban, J., Adams, P. D., Foster, I., Sali, A., Baker, D., & Sliz, P. (2012). Shining Light into Black Boxes. Science, 336(6078), 159–160. doi.org/10.1126/science.1218263; Peng, R. D. (2011). Reproducible Research in Computational Science. Science, 334(6060), 1226–1227. doi.org/10.1126/science.1213847
4See, for example: Mozilla Science Lab (https://wiki.mozilla.org/ScienceLab, USA), SciForge (http://www.sciforge-project.org, DE) Software Sustainability Institute (http://software.ac.uk, UK), Run My Code (http://www.runmycode.org, International), rOpenSci (https://ropensci.org)
5See, for example: Bioconductor in der Biomedizin (http://www.bioconductor.org/) oder Open Source Geospatial Foundation in den Geowissenschaften (http://www.osgeo.org)
6See, for example: Software-Journale wie Open Research Software – JORS (http://openresearchsoftware.metajnl.com, Ubiquity Press) oder SoftwareX (http://www.journals.elsevier.com/softwarex, Elsevier) und die Kooperationen zwischen GitHub und Zenodo (https://guides.github.com/activities/citable-code/) sowie zwischen GitHub und Figshare (http://figshare.com/blog/Working_with_Github_and_Mozilla_to_enable_Code_as_a_Research_Output_/117)
7See: Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020. Version 1.0. 11 December 2013. http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf
8In the beta version of the “Kerndatensatz Forschung [Research Core Dataset for the German science system],” the recording of software under ID “Pu45” is recommended. See: http://kdsf.fit.fraunhofer.de/beta/tables/table-specification.html
10Report of the Helmholtz Open Science Workshop “Access to and Reuse of Research Software” #hgfos16,
November 22–23, 2016 at the Helmholtz Center Dresden-Rossendorf. http://doi.org/10.2312/lis.17.01