Abstract
Crops lose genetic variation due to strong founder effects during domestication, accumulating and potentially exposing recessive deleterious alleles. Therefore, identifying those deleterious variants in domesticated varieties and their functional orthologs in wild relatives is key for plant breeding, food security and in rescuing the biodiversity of cultivated crops. We explored a machine learning strategy to estimate the impact of new and existing mutations in plant genomes, leveraging multi-omics data, encompassing genomic, epigenomic and transcriptomic information. Specifically, we applied a support-vector-machine framework, previously applied to animal datasets, to published omics data of two important crops of the genus Solanum - tomato and potato - and for the model plant Arabidopsis thaliana. We show that our approach provides biologically plausible inferences on the role of mutations occurring in different genomic regions and predictions that correlate with natural genetic variation for the three species, supporting the validity of our estimates. Finally, we show that our estimates outperform existing methods relying exclusively on phylogenetic conservation and not leveraging the availability of omics data for crop species. This approach provides a simple score for researchers to prioritize variants for gene editing and breeding purposes.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Email: avi.levy{at}weizmann.ac.il, fabrizio.mafessoni{at}weizmann.ac.il