PT - JOURNAL ARTICLE AU - Kyle Lesack AU - Grace M. Mariene AU - Erik C. Andersen AU - James D. Wasmuth TI - Different structural variant prediction tools yield considerably different results in <em>Caenorhabditis elegans</em> AID - 10.1101/2022.03.11.483485 DP - 2022 Jan 01 TA - bioRxiv PG - 2022.03.11.483485 4099 - http://biorxiv.org/content/early/2022/09/15/2022.03.11.483485.short 4100 - http://biorxiv.org/content/early/2022/09/15/2022.03.11.483485.full AB - The accurate characterization of structural variation is crucial for our understanding of how large chromosomal alterations affect phenotypic differences and contribute to genome evolution. Whole-genome sequencing is a popular approach for identifying structural variants, but the accuracy of popular tools remains unclear due to the limitations of existing benchmarks. Moreover, the performance of these tools for predicting variants in non-human genomes is less certain, as most tools were developed and benchmarked using data from the human genome.To address this problem, multiple short- and long-read tools were benchmarked using real and simulated Caenorhabditis elegans whole-genome sequence data. To evaluate the use of long-read data for the validation of short-read predictions, the agreement between predictions from a short-read ensemble learning method and long-read tools were compared. The results obtained from simulated data indicate that the best performing tool is contingent on the type and size of the variant, as well as the sequencing depth of coverage. These results also highlight the need for reference datasets generated from real data that can be used as ‘ground truth’ in benchmarks.Competing Interest StatementThe authors have declared no competing interest.