Large portion of essential genes is missed by screening either fly or beetle indicating unexpected diversity of insect gene function

Most gene functions were detected by screens in very few model organisms but it has remained unclear how comprehensive these data are. Here, we expanded our RNAi screen in the red flour beetle Tribolium castaneum to cover more than half of the protein-coding genes and we compared the gene sets involved in several processes between beetle and fly. We find that around 50 % of the gene functions are detected in both species while the rest was found only in fly (~10%) or beetle (~40%) reflecting both technical and biological differences. We conclude that work in complementary model systems is required to gain a comprehensive picture on gene functions documented by the annotation of novel GO terms for 96 genes studied here. The RNAi screening resources developed in this project, the expanding transgenic tool-kit and our large-scale functional data make T. castaneum an excellent model system in that endeavor.


Introduction
Only in a very small number of genetic model species like the mouse Mus musculus, the zebrafish 46 Danio rerio, the nematode Caenorhabditis elegans and the vinegar fly Drosophila melanogaster have 47 the functions of most genes been assayed in systematic screens. This restriction to few model 48 systems is a consequence of the necessity for an elaborate genetic and molecular tool kit, which is 49 extremely laborious to establish ( for understanding the evolution of biodiversity but also for applied research, e.g. for transferring 54 knowledge from model systems to species relevant for medical applications or pest control. 55 Recently, the study of gene function has been extended to non-traditional model organisms. 56 Predominantly Notably, the differences in gene functions documented so far may be an underestimation of the real 71 divergence, because the prevailing candidate gene approach leads to a systematic bias towards 72 conservation. The genes to be tested are usually chosen based on the knowledge of their ortholog's 73 involvement in other species. As a consequence, unrelated genes are rarely tested and the 74 involvement of unexpected genes in a given process is underestimated. Hence, approaches are 75 needed to overcome this bias and to gain a realistic view on the degree of gene function divergence. 76 To that end, genes required for certain biological processes need to be identified in an unbiased and 77 genome-wide manner also in non-traditional organisms, even though this has remained technically 78 challenging. 79 In this paper, we used an expanded dataset to assess the degree of divergence of the gene sets 88 involved in selected developmental processes between fly and beetle such as head, muscle and ovary 89 development, and dorso-ventral patterning. First, we determined genes that were essential in the 90 beetle for these processes but which had so far not been connected to them in D. melanogaster. 91 These a priori unexpected genes sum up to about 37% of the total genes identified to be involved in 92 either one or both species. For 30% of these genes, no functional annotation had been available at 93 FlyBase at all such that we provide the first functional Gene Ontology (GO) assignment for the 94 respective ortholog group in insects. Only two genes essential in T. castaneum did not have an 95 ortholog in D. melanogaster, i.e. these processes seem not much affectd by gene gain or loss. We 96 conclude that restricting genetic screens to one model system only, falls short of identifying a 97 comprehensive set of essential genes. Further, our data reveals an unexpected degree of divergence 98 of gene function between two holometabolous insect species. We also present here an update of the 99 dataset gained in the genome wide iBeetle screen in T. castaneum. Our analysis is based on both, a 100 dataset previously published comprising 5.300 genes (Schmitt-Engel et al., 2015) and an additional 101 3.200 genes screened as part of this project. In addition to those, we also make accessible (at iBeetle-102 Base) the phenotypes for an additional 4,520 genes which were screened while the analysis 103 presented here was ongoing. Hence, with this paper, the coverage of genes tested and annotated at 104 iBeetle-Base sums up to 13.020 Tribolium genes (78 % of the predicted gene set). 105

107
Continuation of the large scale iBeetle screen 108 We added 3,200 genes to the previously published 5,300 genes of our large scale iBeetle screen 109 (Schmitt-Engel et al., 2015), reaching a coverage of 51% of the T. castaneum gene set of total 16,593 110 currently annotated genes (Herndon et al., 2020). We followed the previously described procedure 111 for the pupal injection screen (Schmitt-Engel et al., 2015) with minor modifications (see methods). In 112 short, we injected 10 female pupae per gene with dsRNAs (concentration 1ug/ul). We annotated the 113 phenotypes of the injected animals and the first instar cuticle of their offspring using the EQM the first part of the screen ( Fig. 1 and Table S1). The detailed analysis presented below was based on 119 this set of genes covering approximately 50% of the genome. In parallel, we continued the screen 120 and have in the meanwhile reached a coverage of 78 % (13,020 genes). We publish these additional 121 phenotypic data (accessible online at iBeetle-Base) with this article, but they were not included in the 122 detailed analysis presented here because both analyses ran in parallel. 123 124 125 126 Unexpected gene functions in developmental processes 127 We wanted to use our large-scale phenotypic dataset to systematically compare the gene sets 128 involved in the same biological processes in T. castaneum and D. melanogaster. To that end, we first 129 identified in an unbiased way all genes involved in a number of biological processes by searching 130 iBeetle-Base. Specifically, we scored for phenotypes indicative of functions in dorso-ventral 131 patterning, head and muscle development, in oogenesis, and epithelial adhesion in wings (wing 132 blister phenotypes). For all these processes, we found a number of gene functions that were 133 expected based on D. melanogaster knowledge (see below). This confirmed that the screen design 134 allowed detection of respective phenotypes. Importantly, we also found functions for genes so far 135 not connected to those processes (based on FlyBase information, PubMed searches and scientist 136 expertise). The iBeetle screen is a first pass screen with a focus on minimizing false negative results 137 likelihood for this type of error is further increased by off-target effects and/or by strain specific 139 differences in the phenotype (Kitzmann et al., 2013). Hence, we aimed at excluding false positive 140 annotations for the unexpected gene functions. First, we based our analyses only on genes for which 141 phenotypes had been annotated with a penetrance of >50% in the primary screen. Further, we only 142 used phenotypes that were reproduced by RNAi experiments with non-overlapping dsRNA fragments 143 targeting the same gene. In order to exclude genetic background effects, we used another lab strain 144 (our standard lab strain San Bernardino, SB) except for the muscle project where we needed to use 145 the pBA19 strain, which has EGFP marked muscles (Lorenzen et al., 2007). This re-screening 146 procedure resulted in a set of genes for which we can claim with high confidence that they are 147 indeed involved in these processes in T. castaneum -but which previously were not assigned to these 148 in D. melanogaster (Supplementary Table S2). 149 Assigning the first function to a gene versus extending previous annotations 150 One reason for a lack of respective functional data in FlyBase could be that the knocked-down beetle 151 gene does not have an ortholog in the fly. In order to test this hypothesis, we searched for the fly 152 orthologs in orthoDB and by manually generating phylogenetic trees based on searching T. 153 castaneum, D. melanogaster and M. musculus genomes for orthologs and paralogs. This analysis 154 revealed that only three genes with a novel function (appr. 3%) did not have a D. melanogaster 155 ortholog (yellow in Fig. 2). Evidently, lineage-specific gene loss or gain explains only a minor part of 156 the functional divergence of homologous developmental processes. 157 Next, we asked whether the respective D. melanogaster orthologs were known to be involved in 158 other biological processes or lacked any phenotype information. To that end, we looked up 159 phenotype information of the respective D. melanogaster orthologs on FlyBase (analysis done with 160 OrthoDB v9). Among the fly orthologs whose functional annotations did not match with those from 161 the iBeetle screen or published record, around two thirds (64.6 %) had annotations that were related 162 to other processes than the ones studied in T. castaneum (Fig. 2). Importantly, one third of the genes 163 (32.3 %) did not have any functional annotation in FlyBase. Hence, for those genes, the iBeetle-screen 164 had detected the first documented function of that ortholog group in insects. Importantly, due to the 165

Figure 2 Analysis of genes with unexpected gene functions
A) Numbers of genes with unexpected function in the respective process. B) Combined numbers for all four processes. Only a small portion of genes with novel gene functions did not have orthologs in Drosophila (yellow). About two-thirds of the genes had previous phenotypic annotations relating to other biological processes (blue). For one third of those genes, we had detected the first phenotype for this gene within insects (green). lack of previous phenotypic information, these genes likely would not have been included in a 166 classical candidate gene approach. 167 168 A quarter of Drosophila gene function annotations were not confirmed for T. 169 castaneum 170 In a complementary approach, we asked how many genes known to be involved in a given process in 171 D. melanogaster had been assigned related functions in the iBeetle screen. To that end, we first 172 collected lists of genes involved in those processes based on D. melanogaster knowledge (expert 173 knowledge, literature and FlyBase) (Table S3). Then we mined iBeetle-Base to see how many of the 174 beetle orthologs had an annotation related to that process (Fig. 3A). About two-thirds of those genes 175 had actually been screened in T. castaneum (Fig. S1) and all following numbers are based on the 176 analysis of this subset. 177 A surprisingly large portion of genes (26.4%) known to be involved in these processes in D. 178 melanogaster did not show the expected phenotype in T. castaneum ( Figure 3B).

199
To assess the sensitivity and reliability of the screen, and also to test the accuracy of each screener, 200 we included approximately 5% positive controls from a set of 35 different genes. By and large, we 201 used the same positive controls as in the first screening phase (see Table Table_S1_controls). Tc-zen-202 1 was excluded since the phenotypes were much weaker than in the previous screen, probably due 203 to degradation of the dsRNA. We added new positive controls to score for muscle and stink-gland 204 phenotypes, which we took from novel genes detected in the first screening phase. Muscle 205 phenotypes iB_06061, iB_05796, iB_03227, iB_01705; stink gland and ovary phenotypes: iB_02517. 206 Head defects: iB_05442 (that gene was not scored for its stink gland phenotype because it turned out 207 to be too mild to be identified reliably in high throughput). In 143 cases (80.8%, n=177), the 208 phenotypes of positive controls were fully recognized (for comparison: in the first screening phase 209 the respective numbers were: 90%, n=201). In 14 cases (7.9%; phase 1: 4%) the phenotype was 210 partially recognized. This category includes complex phenotypes where half (one of two aspects: 211 knirps, piwi, SCR, cta, cnc, iB_01705, iB_05442) or two of three aspects (aristaless) of all phenotypic 212 aspects were correctly identified. 13 phenotypes were missed completely (7.3%, phase 1: 4% ). Tc-213 metoprene tolerant (Tc-met) was missed most frequently, probably due to the fact that the 214 embryonic leg phenotype was very subtle and in addition, the penetrance of the phenotype 215 appeared to be lower than in the first screen (penetrance: less than 30%). Seven positive controls 216 (4%, phase 1: 1%) could not be analyzed due to prior technical lethality, i.e. the premature death of 217 the injected pupae prevented the detection of the phenotype. In three cases wrong aspects were 218 annotated (false positive: 1.7%). Depending on the other annotations these positive controls were 219 valued as partially recognized (SCR) or missed (met, CTA). Find more details in Table  220 Table_S1_controls. 221 Negative controls (buffer injections) were mainly annotated correctly (no phenotype in 92.9%; phase 222 1: 96%) and just in 7 cases led to false positive annotations (7.1%; phase 1: 2%) (Table  223 Table_S1_controls; sheet 2). 224

225
Re-screening of selected iBeetle candidates involved in a number of biological processes was 226 performed in order to probe for off-target and strain-specific effects. For that purpose, two 227 independent dsRNA fragments (original iB-fragments and one non-overlapping fragment, both at 228 concentration 1 μg/μl) of the same gene were injected separately into a different genetic background 229 (San Bernardino, SB strain), except for the muscle project where it is required to use the pBA19 strain 230 with EGFP marked muscles. The rest of the injection procedures and analyses were as in the first 231 pupal injection experiment (see details in Materials and Methods). 232

233
The   Fig. 4A). Hence, our current 268 knowledge based on screening in one species appears to be much less comprehensive than 269 previously thought. We believe that the different proportions of genes detected in only one species 270 (11% vs. 37%) may reflect both, biological and technical differences (see below). 271 In summary, despite some uncertainties with respect to the exact numbers (see discussion below), 272 our findings provide a compelling argument that focusing on single model species falls short of 273 comprehensively revealing the genetic basis of biological processes in any given clade. Further, it 274 shows that T. castaneum is an extremely useful screening system for insect biology, able to reveal 275 novel gene functions even in processes that have been studied intensely in D. melanogaster. 276

277
Estimating the portions of gene functions revealed in fly versus beetle 278 Our beetle data are based on both, our systematic screening of 51% of the T. castaneum gene set 279 and on previous candidate gene work. With respect to fly data, we rely on information available on 280 FlyBase and our expert knowledge of the processes under scrutiny. Given these different kinds of 281 sources and approaches, the data are prone to various types of uncertainties. Therefore, we discuss 282 the way we combined the numbers to calculate our estimation. Subsequently, we will discuss some 283 uncertainties and in how far they influence the estimation. 284 Of the genes known from D. melanogaster to be involved in the processes investigated here (n = 132; 285 see Table S4), we could compare 66% to iBeetle data (column 1 in Fig. 4A; based on Fig. S1; n = 87). 286 Of those genes, 26% (n = 23) were not involved in that process in T. castaneum (column 2 in Fig. 4A; 287 based on Fig. 3). For our overall estimation, we extrapolated this share to the total number of genes 288 involved in the fly (hatched lines from column 2 to column 4). A number of gene functions detected 289 in the iBeetle screen had not been assigned such functions in D. melanogaster before (column 3 in 290

Figure 4 Many genes are detected only in one of the species in the same processes
Combining genes found in fly (column 1) and/or beetle (column 3) leads to the currently known insect gene set for the processes analysed here. Portions shown in column 1 and 2 are based on Fig.  S1, Fig.2 and Fig. 3. We calculate the portions of genes of the combined insect gene set (column 4), which were detected only in Drosophila (11 %), only in Tribolium (37%) or in both (52%). See text for details and discussion of potential systematic biases. B) Respective values for the single processes show that the minimum contribution of the Tribolium screening platform amounted to 20% genes not detected in Drosophila. See table S4 for calculations. Neither model species is able by itself to detect "the insect gene set".  Fig. 2). When combining these numbers, we aimed at providing a minimum 291 estimation for divergence of detected gene functions (Column 4 in Fig. 4A). To be conservative, we 292 assumed that all gene functions known from D. melanogaster but not yet tested in the iBeetle screen 293 would fall into the class of genes being involved in both species (see numbers in green square in 294  Table S4). Further, we scored each signaling pathways as one case (finding mostly conservation) even 295 if single components of these pathways had not divergent phenotypes. This conservative assumption 296 leads to the abovementioned minimum estimation of divergence in these gene sets (Column 4 in Fig.  297 4A; calculation in Table S4). Of all genes currently known to be involved in one of the processes we 298 studied, the portion of genes detected exclusively in the fly (11%; n = 23) is much smaller than the 299 one detected only in the beetle (37%; n = 76) while the analogous function of half of the genes (52%; 300 n = 109) is detected in both species. 301 With this work, we present the first and a quite extensive dataset to estimate this kind of numbers. 302 Still, some confounding issues need to be considered. The first uncertainty stems from the fact that 303 the beetle data is based on testing about 50 % of the genes. In the second part of the screen, we had 304 prioritized genes that were e.g. highly expressed, showed sequence conservation and had GO 305 annotations. The prioritization apparently was successful as 66% of the gene functions known from 306 D. melanogaster had been covered in the iBeetle screen (Fig. 4A), which is much more than the 40% 307 expected for an unbiased selection (Schmitt-Engel et al., 2015). Hence, our figures might be biased 308 towards conserved gene function. As a consequence, the overall portion of beetle specific genes 309 without conserved functions likely is even higher than reflected in Fig. 4A. 310 Second, we found quite different numbers for the four processes under scrutiny (Fig. 4B). However, 311 even in the process with the lowest portion of genes detected exclusively in T. castaneum (muscle 312 development), this portion was 21%, which still indicates a significant degree of unexplored biology. 313 Third, the D. melanogaster numbers could be influenced by false negative data. The data on FlyBase 314 has not been gathered in one or few standardized screens where all data were published -it is 315 mainly based on published results of single gene analyses. However, not all genetic screens have 316 reached saturation and not all genes detected in large-scale screens may have been further analyzed 317 and published. Hence, the number of genes in principle detectable in D. melanogaster might actually 318 be larger than the numbers extracted from FlyBase. In the iBeetle screen, in contrast, negative data 319 was systematically documented, such that this type of uncertainty is restricted to technical false 320 negative data, which we found to be around 15% in this first pass screen (Fig. 1). This uncertainty 321 could potentially increase the portion of D. melanogaster specific or conserved genes. Fourth, 322 theoretically there may be false positive data albeit restricted to the set of genes detected in both 323 species. The reason is that iBeetle was a first pass screen, where we aimed at reducing false negative 324 data with the tradeoff that false positive data are enriched (Schmitt-Engel et al., 2015). Although 325 finding similar phenotypes in two different species will not in many cases be false positive, we tried 326 to minimize this error by manually checking the annotations of the respective genes, excluding those 327 that showed a phenotype with low penetrance or in combination with many other defects indicating 328 a non-specific effect. Of note, the issue of false positives is restricted to the genes detected in both 329 species (column 2; based on Fig. 3). It does not apply to those genes detected only in the beetle but 330 not the fly (column 3; based on Fig. 2) because in this case, all phenotypes were confirmed by 331 independent experiments with non-overlapping dsRNA fragments in different genetic backgrounds 332 such that false positive results are excluded. In summary, while there are a number of uncertainties 333 that we could not clarify with available data or methods, most of these uncertainties hint at 334 underestimation rather than overestimation of functional divergence between fly and beetle. 335 336 Technical characteristics contribute to the detection of unequal gene sets 337 Our numbers reveal that functionally comparable gene sets in two quite closely related model 338 systems are far from identical. A question of obvious biological relevance but not easily resolved is: 339 to which degree do these differences reflect biologically meaningful divergence of gene functions, or 340 alternatively, simply result from technical problems, i.e. reflect different strengths and weaknesses of 341 the respective screening methods and model systems? 342 As discussed above, some degree of false negative data may be expected in both model systems. In 343 case of the iBeetle screen, this will be restricted to technical false negative data. In the D. 344 melanogaster field, there may be additional false negative data due to the lack of saturation of 345 screens and/or lack of reporting of genes that were not studied in detail. However, given the extent 346 and comprehensiveness of work in the D. melanogaster field we feel that this might not be of high 347 relevance. As to different strengths of screening procedures, it is certainly true that the way screens 348 are performed influences what sets of genes can be detected. For instance, our parental RNAi 349 approach knocked down both, maternal and zygotic contributions while some classic D. Most relevant for the field of functional genetics is our conclusion that the degree of divergence of 370 gene functions is larger than previously assumed. Therefore, some genes are detected only in one 371 species because the gene's function is not required for that process in the other. Indeed, there is 372 evidence supporting this view. In a recent study, a number of muscle genes identified in the iBeetle 373 screen were more closely investigated in D. melanogaster (Schultheis et al., 2019a; Schultheis et al., 374 2019b). Despite some efforts, the negative data for fly orthologs appeared to be real negative. For 375 example, null mutations of one of the genes found in our beetle, nostrin, did not elicit a phenotype in 376