ABSTRACT
Background In vivo transposon mutagenesis coupled with deep sequencing enables large-scale genome-wide mutant screens for genes essential in different growth conditions. Six large scale studies have now been performed with three yeast species (S. cerevisiae, S. pombe and C. albicans), each mutagenized with two of three different heterologous transposons (AcDs, Hermes, and PiggyBac).
Results We analyzed predictions of gene essentiality for each of the six studies and evaluated the ability of the data to predict gene essentiality using a machine-learning approach. Important data features included a sufficient number of independent insertions and the degree of random insertion distribution. All transposons showed some bias in insertion site preference, both because of jackpot events, specific insertion sequence preferences and preferences for short-range vs long range insertions. For PiggyBac, a stringent target sequence limited the ability to predict essentiality in genes with few or no target sequences. Furthermore, the machine learning approach is robust for predicting gene function in less well-studied species by leveraging cross-species orthologs. Finally, comparisons of isogenic diploid vs haploid S. cerevisiae isolates identified several genes that are haplo-insufficient, while most essential genes, as expected, were recessive.
Conclusions We provide recommendations for the choice of transposons and the inference of gene essentiality in genome-wide studies of eukaryotic microbes such as yeasts, including species thathat have been less amenable to classical genetic studies. These include maximizing the number of unique insertions, avoiding transposons with stringent target sequences and a method for cross-species transfer learning.