Abstract
Motivation The amorphous nature of genes combined with the prevalence of duplication events makes establishing correct genetic phylogenies challenging.
Since homologous gene groups are traditionally formed on basis of sequence similarity, both orthologs and paralogs are often placed in the same gene group by existing tools. Certain tools such as PoFF take syntenic relationship of genes into consideration when forming gene groups. However, a method to form gene groups consisting of only true syntelogs has not yet been developed.
In order to obtain orthologous gene groups consisting of the most likely syntelogs we need a method to filter out paralogs. If one strain has two or more copies of the same gene in a gene group we want to keep only the true syntelog in the group, and remove the paralogous copies by distinguishing between the two using synteny analysis.
Results We present a novel algorithm for measuring the degree of synteny shared between two genes and successfully disambiguate gene groups. This synteny measure is the basis for a number of other useful functions such as gene neighbourhood visualisation to inspect suspect gene groups, strain visualisation for assessing assembly quality and finding genomic areas of interest, and chromosome/plasmid classification of contigs in partially classified datasets.
Availability The latest version of Syntenizer 3000 can be downloaded from the GitHub repository at https://github.com/kamiboy/Syntenizer3000/
Consult the manual.pdf file in the repository for instructions on how to build and use the tool, as well as a in depth explanation of the algorithms utilised.