Abstract
What is the source of new genes? What fuels genetic innovation, the substrate of long-term adaptation? The mechanism of gene acquisition by de novo emergence from previously non-coding sequences, has long been considered as highly improbable1. New genes were assumed to mostly appear by gene duplication and divergence2 or by horizontal gene transfer3. In the last decade, only a handful of de novo genes have been functionally characterized4,8, exemplifying their contribution to evolutionary innovations. However, the quantitative importance of de novo emergence and a proper description of the dynamics of emergence are still lacking, mainly due to the difficulty of distinguishing de novo candidates from highly diverged homologs, from wrongly annotated protein coding genes, and from genes acquired horizontally from remote species. Here we address these issues by using a multi-level systematic approach that carefully selects de novo candidates among a set of genes taxonomically restricted to yeast genomes. We predict 703 de novo genes in 15 yeast genomes whose phylogeny spans at least 100 million years of evolution9. We have validated 82 candidates, by providing new translation evidence for 25 of them through mass spectrometry experiments, in addition to those whose translation has been independently reported previously. Our results establish that de novo gene emergence is a widespread phenomenon in the yeast subphylum, only a few being ultimately maintained by selection. As we found that de novo genes preferentially arise in GC-rich intergenic regions transcribed from divergent promoters, such as recombination hotspots, we propose a mechanistic model for the early stages of de novo gene emergence and evolution in eukaryotes.