PT - JOURNAL ARTICLE AU - Niu, Yi Nian AU - Roberts, Eric G. AU - Denisko, Danielle AU - Hoffman, Michael M. TI - Assessing and assuring interoperability of a genomics file format AID - 10.1101/2022.01.07.475366 DP - 2022 Jan 01 TA - bioRxiv PG - 2022.01.07.475366 4099 - http://biorxiv.org/content/early/2022/01/07/2022.01.07.475366.short 4100 - http://biorxiv.org/content/early/2022/01/07/2022.01.07.475366.full AB - Background Bioinformatics software tools operate largely through the use of specialized genomics file formats. Often these formats lack formal specification, and only rarely do the creators of these tools robustly test them for correct handling of input and output. This causes problems in interoperability between different tools that, at best, wastes time and frustrates users. At worst, interoperability issues could lead to undetected errors in scientific results.Methods We sought (1) to assess the interoperability of a wide range of bioinformatics software using a shared genomics file format and (2) to provide a simple, reproducible method for enhancing inter-operability. As a focus, we selected the popular Browser Extensible Data (BED) file format for genomic interval data. Based on the file format’s original documentation, we created a formal specification. We developed a new verification system, Acidbio (https://github.com/hoffmangroup/acidbio), which tests for correct behavior in bioinformatics software packages. We crafted tests to unify correct behavior when tools encounter various edge cases—potentially unexpected inputs that exemplify the limits of the format. To analyze the performance of existing software, we tested the input validation of 80 Bioconda packages that parsed the BED format. We also used a fuzzing approach to automatically perform additional testing.Results Of 80 software packages examined, 75 achieved less than 70% correctness on our test suite. We categorized multiple root causes for the poor performance of different types of software. Fuzzing detected other errors that the manually designed test suite could not. We also created a badge system that developers can use to indicate more precisely which BED variants their software accepts and to advertise the software’s performance on the test suite.Discussion Acidbio makes it easy to assess interoperability of software using the BED format, and therefore to identify areas for improvement in individual software packages. Applying our approach to other file formats would increase the reliability of bioinformatics software and data.Competing Interest StatementThe authors have declared no competing interest.