Abstract
The pangenome is the set of all genes present in a prokaryotic species. Most pangenomes contain many accessory genes that are present in only some of the species members. Genes need to function together, and it has been suggested that selection for certain gene combinations affects the structure of prokaryotic pangenomes. Nevertheless, genes might also co-occur simply due to being linked on the genome, and efficient tools are needed to distinguish linkage from co-selection. Here we present Goldfinder, an approach to infer co-occurrence and co-avoidance between gene pairs by taking the phylogenetic relationships of the species into account. The approach is implemented in an efficient Python script available at https://github.com/fbaumdicker/goldfinder. We also provide scripts for clustering co-occurring genes and for visualizing the resulting co-occurrence and co-avoidance networks in Cytoscape. In comparison to the co-occurrence inference tool Coinfinder, Goldfinder finds fewer co-occurring pairs in a real species pangenome, suggesting that fewer spurious associations due to phylogenetic dependencies are detected. We conclude that Goldfinder is a fast and accurate tool to infer gene co-occurrence and co-avoidance, which will enable large-scale analyses to infer co-selected genes across bacterial pangenomes.
Competing Interest Statement
The authors have declared no competing interest.