The ReproVIP project
aims at evaluating and improving the reproducibility of scientific
results obtained with the
Virtual Imaging Platform (VIP)
in the field of medical imaging. ReproVIP focuses on a
reproducibility level ensuring that the code produces the same
result when executed with the same set of inputs and that an
investigator is able to reobtain the published results.
We investigate reproducibility at three levels:
This web page corresponds to the ReproVIP deliverable providing documentation on the reproducibility best practices and tools developed by ReproVIP.
In the last few years, there has been a growing awareness of
reproducibility concerns in many areas of science. In medical
imaging, the increasing complexity of the software and pipelines
undermines our ability to reproduce scientific results over time
and across research teams. Here we focus on computational
reproducibility, which is one's ability to obtain identical
results by repeating identical treatments to the same set of
inputs.
The causes for the lack of computational reproducibility are
mainly related to the library dependencies and their evolution
over time, but also to numerical instability due to floating point
arithmetic issues (rounding errors, hardware and compiler
optimizations).
In
this paper
presented the the ISBI 2023 conference, we evaluated the
reproducibility of tumor segmentation outcomes produced with a
deep segmentation model when MRI images are pre-processed (i) with
two different versions of the same pre-processing pipeline, and
(ii) by introducing numerical perturbations that mimic executions
on different environments. Experiments highlighted the fact that
the results were not bitwise reproducible. The inter-environment
variability measured by introducing numerical perturbations was in
the same order of magnitude as the between-version variability,
suggesting that the underlying causes for instability may be
numerical stability. The variability in segmentation outcomes also
depended on the input data. This data sensitivity may introduce a
bias in model performance at the patient level.
In the last few years, the main approach to avoid software
variability has been to containerize software using Docker or
Singularity. However, software containers do not control for
hardware heterogeneity. In
this paper
submitted at the
ACM REP 2024
conference, we studied the effect of hardware variability on
linear registration results produced by a neuroimaging application
packaged with Docker and Guix. Results showed that hardware,
software, and numerical variability lead to perturbations of
similar magnitudes (although uncorrelated) suggesting that these
three types of variability act as independent sources of numerical
noise with similar magnitude. The effect of hardware perturbations
on linear registration remained moderate, but might impact
downstream analyses when linear registration is used as
initialization step for other operations.
Moving towards the exploration process level, we studied the
reproducibility of metabolite quantification with
in vivo MR spectroscopy paying attention to the
reproducibility of the study itself. The code and data used to
deliver the results have been thus made available on
our GitLab repository. In this study,
the variability of outcomes was monitored at three distinct
levels: (i) between two quantification algorithms, accounting for
inter-software variability; (ii) between two sets of parameters
for each software, accounting for model flexibility; and (iii)
across multiple executions for each software-parameter set,
accounting for random seeds in the fitting process. Results showed
substantial variations at the three levels, allowing for more
potent analyses than from a single parameter set or single
software point of view.
The ReproVIP project has identified a set of best practices that can help improving the reproducibility of computational experiments. These best practices cover different aspects of the research process, from experimental design to data analysis and reporting. The list is not exhaustive and is mainly based on the bibliography list provided here.
Transparency refers to the openness and clarity with which the research process and its outcomes are communicated and made accessible to others. Its main goal is to allow a critical reader to evaluate the work and fully understand its strengths and limitations. This generally requires the full disclosure of the methods, the raw and manipulated data, as well as all the code used for processing the data.
Sharing code and data is the first and foremost strategy available to maximize the transparency of research methods. It also allows essential for the reproducibility and replicability of the study. You can use platforms like GitHub or Zenodo. If ethical or privacy constraints may hinder accessibility, a detailed description needs to be provided.
Document your code thoroughly with comments, docstrings, and README files. Explain the purpose of each function, the logic behind complex algorithms, and how to run the code. Follow standards such as the FAIR principles (Findable, Accessible, Interoperable, Reusable) to ensure that your research outputs are well-documented and accessible to others.
Version control is essential for both code and data. Use version control systems like Git to track changes in code, data, and documentation. This ensures that changes are logged and reversible, facilitating collaboration and reproducibility.
Explicitly specify dependencies, including libraries, packages, and their versions. Containerization tools like Docker allow to encapsulate your environment, including dependencies and configurations. This ensures that your code runs consistently across different systems. Guix allows to build reproducible computational environments that accurately document the software building chain and its dependencies.
Continuous integration pipelines allow to automate testing and ensure that changes do not break reproducibility. Services like Travis CI, CircleCI, GitLab CI or GitHub Actions can help in this regard.
Workflows streamline the execution of computational tasks by defining a series of steps or processes in a systematic and automated manner. Tools like Nipype or Nextflow can manage complex data analyses, enhancing reproducibility by reducing manual errors and making the entire process easy to replicate.
Literate programming integrates code with comprehensive documentation and narrative explanations, often within the same document. Tools like Jupyter Notebooks or R Markdown enable researchers to combine executable code with rich text, figures, and mathematical equations. This transparency makes it easier for others to understand, verify, and reproduce the research.
The ReproVIP project has developed a set of tools to help researchers improve the reproducibility of their experiments. Here is a selection of tools and examples that can be used to implement the best practices identified by the project.
The VIP Python client allows to automate the execution of applications on VIP and can be easily integrated in Jupyter notebooks. Application examples can be found in this repository, including tutorials and workshop materials.
The dashboard allows for the visual interpretation of results. It also helps to detect differences, i.e. possible reproducibility issues, through a number of application specific reproducibility metrics.
The ReproVIP GitLab repository is used to automate the execution of reproducibility tests and the evaluation of the reproducibility of results. Is is integrated with the dashboard.
VIP deliverables
Web resources