Reproducibility in bioinformatics workflows applied to genomics and clinical oncology

Reproducibility in bioinformatics workflows applied to genomics and clinical oncology

Authors

  • Juan Sebastian Loza Chiriboga Universidad Internacional de Valencia. Facultad de Ciencias de la Salud. Bioinformática. Valencia, España.
  • Valeria Alexandra Riera Sampedro Universidad Nacional de Chimborazo, Facultad de Ciencias de la Salud, Riobamba, Chimborazo, Ecuador. Código postal: 060106. Correo electrónico: valeria.riera@unach.edu.ec
  • Michael Gustavo Miranda Coello Médico General, Clínica Los Pinos - Hospital General, Quito, Pichincha, Ecuador. Código postal: 170125. Correo electrónico: md.gustavo25@gmail.com.
  • Dennys Rodrigo Lopez Chavez Odontólogo General, Investigador independiente, Riobamba, Chimborazo, Ecuador. Código postal: 060106. Correo electrónico: dennysrodrigolopez@gmail.com.

DOI:

https://doi.org/10.47187/cssn.Vol16.Iss2.453

Keywords:

bioinformatics, genomics, oncology, reproducibility, Snakemake

Abstract

Introduction: The exponential increase in genomic data in clinical oncology has positioned biological reproducibility as a critical pillar for the validity of personalised diagnoses and therapies. However, variability in pipelines and the lack of technical standards are causing a crisis of consistency in results. Objective: The analysis of bioinformatics workflows applied to oncology genomics is essential for evaluating the capacity of tools such as Snakemake, Onkopipe and iCOMIC to guarantee reproducible clinical results. Methodology: A comprehensive review was conducted in accordance with the PRISMA 2020 statement, incorporating a systematic approach to the analysis of existing literature. The search was conducted using the PubMed and ScienceDirect databases. Of the 124 records identified, 38 studies met the eligibility criteria and were analysed using a qualitative and thematic synthesis. Results: It was determined that a reproducibility crisis was occurring, and this was connected to the inadequate documentation of parameters. Of the studies analysed, 18 described explicit practices: 88.8% (n=16) highlighted detailed documentation, 77.7% (n=14) the use of workflow managers such as Snakemake, and 61.1% (n=11) the implementation of containers and version control. Twelve specific workflows were evaluated, and the use of Snakemake was found to optimise the traceability and scalability of molecular diagnostics. Conclusions: The rigorous adoption of automated workflow managers and the standardization of technical documentation are essential for transitioning from research bioinformation to auditable and reliable clinical genomics.

Downloads

Download data is not yet available.

References

1. Canzoneri R, Lacunza E, Abba MC. Genómica y bioinformática como pilares de la medicina de precisión en oncología. Medicina (B Aires). 2019;79(6/1):587–92.

2. Del Pozo A. Bioinformática y gestión de datos ómicos en diagnóstico genético. An Pediatr (Engl Ed). 2025;103:504013. doi:10.1016/j. anpedi.2025.504013

3. Yang J, Beißbarth T, Dönitz J. Onkopipe: A Snakemake Based DNA-Sequencing Pipeline for Clinical Variant Analysis in Precision Medicine. Stud Health Technol Inform [Internet]. el 12 de septiembre de 2023 [citado el 1 de noviembre de 2025];307:60–8. Disponible en: https:// pubmed.ncbi.nlm.nih.gov/37697838/

4. Fernandez Isern G. Herramientas informáticas para la bioinformática [Internet]. Fundació Universitat Oberta de Catalunya (FUOC); 2023 [citado 2025 Nov 1]. Available from: https://eines-informatiques.recursos.uoc.edu/ workflows/es/

5. Yang J. An automated data integration platform for interpreting genomic data and reporting treatment options in molecular tumor boards [Internet]. 2024 [citado 2025 Nov 1]. Available from: https://ediss.uni-goettingen.de/handle/11858/15432

6. Anilkumar Sithara A, Maripuri DP, Moorthy K, Amirtha Ganesh SS, Philip P, Banerjee S, et al. iCOMIC: a graphical interface-driven bioinformatics pipeline for analyzing cancer omics data. NAR Genom Bioinform. 2022;4(3).

Available from: https://pubmed.ncbi.nlm.nih. gov/35899080/

7. Ziemann M, Poulain P, Bora A. The five pillars of computational reproducibility: bioinformatics and beyond. Brief Bioinform. 2023;24(6):bbad375. doi:10.1093/bib/bbad375

8. Flier JS. The Problem of Irreproducible Bioscience Research. Perspect Biol Med [Internet]. el 1 de junio de 2022 [citado el 1 de noviembre de 2025];65(3):373–95. Disponible en: https://muse.jhu.edu/pub/1/article/863666

9. Gundersen OE. The fundamental principles of reproducibility. Philos Trans A Math Phys Eng Sci. 2021;379(2197). doi:10.1098/

rsta.2020.0210

10. Kim YM, Poline JB, Dumas G. Experimenting with reproducibility: a case study of robustness in bioinformatics. Gigascience. 2018;7(7):giy077. doi:10.1093/gigascience/giy077

11. Ziemann M, Poulain P, Bora A. The five pillars of computational reproducibility: bioinformatics and beyond. Brief Bioinform [Internet]. el 22 de septiembre de 2023 [citado el 1 de noviembre de 2025];24(6):1–13. Disponible en: https:// dx.doi.org/10.1093/bib/bbad375

12. Mattevi S, Mazzarotto F, Martini P. Allelespecific expression analysis: pipelines, applications, challenges, and unmet needs. Comput Biol Med. 2025;196:110890

13. Kadri S, Sboner A, Sigaras A, Roy S. Containers in bioinformatics: applications, practical considerations, and best practices in molecular pathology. J Mol Diagn. 2022;24(5):442–54.

14. Vallet N, Michonneau D, Tournier S. Toward practical transparent verifiable and long-term reproducible research using Guix. Sci Data. 2022;9(1):1–9.

15. Git Project. Git documentation [Internet]. [citado 2025 Nov 1]. Available from: https:// git-scm.com/docs/gi

16. Cadwallader L, Gabhann FM, Papin J, Pitzer VE. Advancing code sharing in the computational biology community. PLoS Comput Biol.

2022;18(6):e1010193

17. Granger BE, Perez F. Jupyter: thinking and storytelling with code and data. Comput Sci Eng. 2021;23(2):7–14.

18. O’Brien J, Mitchell C, Auerbach S, Doonan L, Ewald J, Everett L, et al. Bioinformatic workflows for deriving transcriptomic points of departure: current status, data gaps, and research priorities. Toxicol Sci. 2025;203(2):147–59.

19. Mölder F, Jablonski KP, Letcher B, Hall MB, van Dyken PC, Tomkins-Tinch CH, et al. Sustainable data analysis with Snakemake. F1000Res. 2025;10:33.

20. Snakemake developers. Snakemake documentation [Internet]. [citado 2025 Nov 2]. Available from: https://snakemake. readthedocs.io/en/stable/index.html

21. Baykal PI, Łabaj PP, Markowetz F, Schriml LM, Stekhoven DJ, Mangul S, et al. Genomic reproducibility in the bioinformatics era. Genome Biol. 2024;25(1):1–15.

22. Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34(17):i884–90.

23. Broad Institute. Picard tools [Internet]. [citado 2025 Nov 2]. Available from: https:// broadinstitute.github.io/picard/

24. Rapti M, Zouaghi Y, Meylan J, Ranza E, Antonarakis SE, Santoni FA. CoverageMaster: comprehensive CNV detection and visualization from NGS short reads for genetic medicine applications. Brief Bioinform. 2022;23(2):bbac049.

25. Manolio TA, Fowler DM, Starita LM, Haendel MA, MacArthur DG, Biesecker LG, et al. Building bridges between basic and clinical genomic research. Cell. 2017;169(1):6–12.

26. Dodani DD, Talhouk A. Assessing the reproducibility crisis in vaginal microbiome studies for clinical applications in endometrial cancer. Clin Cancer Res. 2024;30(5 Suppl):A023.

27. Tang S, Borlak J. Genomics of human NAFLD: lack of data reproducibility and high interpatient variability in drug target expression

as major causes of drug failures. Hepatology. 2024;80(4):901–15.

28. Pan B, Ren L, Onuchic V, Guan M, Kusko R, Bruinsma S, et al. Assessing reproducibility of inherited variants detected with short-read whole genome sequencing. Genome Biol. 2022;23(1):2.

29. Baykal PI, Łabaj PP, Markowetz F, Schriml LM, Stekhoven DJ, Mangul S, et al. Genomic reproducibility in the bioinformatics era. Genome Biol [Internet]. el 9 de agosto de 2024 [citado el 9 de noviembre de 2025];25(1):213. Disponible en: https://genomebiology. biomedcentral.com/articles/10.1186/ s13059-024-03343-2

30. Keenum I, Jackson SA, Eloe-Fadrosh E, Schriml LM. A standards perspective on genomic data reusability and reproducibility. Front Bioinform. 2025;5:1572937.

31. Cokelaer T, Cohen-Boulakia S, Lemoine F. Reprohackathons: promoting reproducibility in bioinformatics through training. Bioinformatics. 2023;39(Suppl_1):i11–20.

32. Kulkarni N, Alessandrì L, Panero R, Arigoni M, Olivero M, Ferrero G, et al. Reproducible bioinformatics project: a community for reproducible bioinformatics analysis pipelines. BMC Bioinform. 2018;19(10):5.

Published

2026-01-25

How to Cite

Loza Chiriboga, J. S., Riera Sampedro, V. A., Miranda Coello, M. G., & Lopez Chavez, D. R. (2026). Reproducibility in bioinformatics workflows applied to genomics and clinical oncology: Reproducibility in bioinformatics workflows applied to genomics and clinical oncology. LA CIENCIA AL SERVICIO DE LA SALUD Y NUTRICIÓN, 16(2), C_155–162. https://doi.org/10.47187/cssn.Vol16.Iss2.453

Issue

Section

Revisiones bibliográficas

Similar Articles

You may also start an advanced similarity search for this article.

Most read articles by the same author(s)