Abstract
Motivation It has been proposed that future reference genomes should be graph structures in order to better represent the sequence diversity present in a species. However, there is currently no standard method to represent genomic intervals, such as positions of genes, on graph-based reference genomes.
Results We formalize offset-based coordinate systems on graph-based reference genomes and introduce a method for representing intervals on these reference structures. We show the advantage of our method by representing genes on a graph-based representation of the GRCh38 version of the human genome and its alternative loci for regions that are highly variable.
Conclusion More complex reference genomes, containing alternative loci, require methods to represent genomic data on these structures. Our proposed notation for genomic intervals makes it possible to fully utilize the alternative loci of GRCh38 and potential future graph-based reference genomes. We illustrate our notation for genomic intervals, as well as the offset-based coordinate systems, through a web tool at: https://github.com/uio-cels/gen-graph-coords.