Identification of essential genes in Caenorhabditis elegans through whole genome sequencing of legacy mutant collections

It has been estimated that 15-30% of the ∼20,000 genes in C. elegans are essential, yet many of these genes remain to be identified or characterized. With the goal of identifying unknown essential genes, we performed whole genome sequencing on complementation pairs from legacy collections of maternal-effect lethal and sterile mutants. This approach uncovered maternal genes required for embryonic development and genes with putative sperm-specific functions. In total, 58 essential genes were identified on chromosomes III, IV, and V, of which 49 genes are represented by novel alleles in this collection. Of these 49 genes, 19 (40 alleles) were selected for further functional characterization. The terminal phenotypes of embryos were examined, revealing defects in cell division, morphogenesis, and osmotic integrity of the eggshell. Mating assays with wild-type males revealed previously unknown male-expressed genes required for fertilization and embryonic development. The result of this study is a catalogue of mutant alleles in essential genes that will serve as a resource to guide further study toward a more complete understanding of this important model organism. As many genes and developmental pathways in C. elegans are conserved and essential genes are often linked to human disease, uncovering the function of these genes may also provide insight to further our understanding of human biology.


INTRODUCTION
In the legacy mutant collections described above, where large numbers of mutants are isolated, 127 it is feasible to obtain complementation groups with multiple alleles for many loci. In addition, 128 the abundance of mutants obtained in these large-scale genetic screens suggests that some 129 legacy mutant collections may harbor strains for which the mutations remain unidentified. If identifying mutants. Additionally, some of the genes identified as essential in RNAi screens may 143 belong to paralogous gene families whose redundant functions are masked in single gene 144 knockouts. Although the total number of essential genes in C. elegans is unknown, extrapolation 145 from saturation mutagenesis screens has led to estimates that approximately 15-30% of the 146 ~20,000 genes in this organism are essential (Clark et al. 1988; Howell and Rose 1990;Johnsen 147 and Baillie 1997; The C. elegans Deletion Mutant Consortium 2012). This suggests the possibility that there are many essential genes in C. elegans that remain unidentified and/or lack 149 representation by a null allele. 150

151
In this study, we use WGS to revisit two C. elegans legacy mutant collections isolated more than 152 25 years ago. These collections are a rich resource for essential gene discovery; they comprise 75 153 complementation groups in which at least two alleles with sterile or maternal-effect lethal 154 phenotypes have been found. With these collections, we sought to identify novel essential genes 155 and to conduct a preliminary characterization of their roles in fertilization and development. 156 Wild-type male rescue assays are used to attribute some mutant phenotypes to sperm-specific 157 genetic defects. In addition, we examine arrested embryos using differential interference 158 contrast (DIC) microscopy and document their terminal phenotypes. This work comprises a 159 catalogue of 123 alleles with mutations in 58 essential genes on chromosomes III, IV, and V. Of 160 these 58 genes, 49 are represented by novel alleles in this collection. We present several genes 161 which are reported here for the first time as essential genes and mutant alleles for genes that 162 have only previously been studied with RNAi knockdown. The aim of this work is to help 163 accelerate research efforts by identifying essential genes and providing an entry point into 164 further investigations of gene function. Advancing our understanding of essential genes is 165 imperative to reaching a more comprehensive knowledge of gene function in C. elegans and may 166 provide insight into conserved processes in developmental biology, parasitic nematology, and 167 human disease. 168

170
Generation of legacy mutant collections 171 Mutant strains were isolated in screens for maternal-effect lethal and sterile alleles in the early 172 1990s by Heinke Holzkamp and Ralf Schnabel (unpublished data), and Richard Feichtinger 173 (Feichtinger 1995). Two balancer strains were used for mutagenesis; GE1532: unc-32(e189)/qC1 174 subjected to ethyl methanesulfonate (EMS) mutagenesis at 20° as described by Brenner (1974), 177 with a mutagen dose of 50-75 mM and duration between 4 and 6 hours. Following mutagenesis, 178 L4 F1 animals were singled on plates at either 15° or 17°. Animals with homozygous markers in 179 the F2 or F3 generation were transferred to 25° and subsequently screened for the production of 180 dead eggs, unfertilized oocytes, or no eggs laid. The two mutant collections analyzed in this 181 study are summarized in Table 1. 182

184
List of strains 185 The wild-type Bristol N2 derivative PD1074 and strains with the following mutations were used: 186 him-3(e1147), unc-32(e189), qC1[dpy-19(e1259) glp-1(q339)] , him-9(e1487), unc-24(e138), dpy-187 the Caenorhabditis Genetics Center (University of Minnesota). Nematode strains were cultured 192 as previously described by Brenner (1974 Thompson et al. 2013). Reads were aligned to the C. elegans reference 234 reads at that location. The remaining candidates were then subjected to a series of custom 241 filters. Any variants that appeared in more than three strains from the same collection were 242 removed. The remaining list was filtered to only include heterozygous mutations affecting coding 243 exons (indels, missense and nonsense mutations) and splice sites (defined as the first two and 244 last two base pairs in an intron). Finally, the list of candidate mutations was trimmed to include 245 only mutations on the chromosome to which the mutation had originally been mapped. 246

247
For each pair of strains belonging to a complementation group, the final list of candidate 248 mutations was compared and the gene or genes in common were identified. In cases where 249 there was only one gene in common on both lists, this gene was designated the candidate 250 essential gene. For complementation groups with multiple candidate genes in common, 251 additional information such as the nature of the mutations and existing knowledge about the 252 genes was used to select a single candidate gene, when possible. When there was no gene 253 candidate in common within a pair of strains, the list of variants was reanalyzed to look for larger 254  Human orthologs of C. elegans genes were determined using Ortholist 2 (ortholist.shaye-lab.org; 294 Kim et al. 2018). For maximum sensitivity, the minimum number of programs predicting a given 295 ortholog was set to one. NCBI BLASTp (blast.ncbi.nlm.nih.gov; Altschul et al. 1990) was used to 296 examine distributions of homologs across species and potential nematode-specificity in genes 297 with no human orthologs. Protein sequences from the longest transcript of each gene were used 298 to query the non-redundant protein sequences (nr) database, with default parameters and a 299 maximum of 1,000 target sequences. The results were filtered with an E-value threshold of 10 -5 . 300 Temperature sensitivity and mating assays 307 To assay temperature sensitivity, heterozygous strains were propagated at 15° and homozygous 308 L4 animals were isolated on 60 mm NGM plates (2 x 6/plate or 3 x 3/plate

RESULTS 337
Identification of 58 essential genes 338 Whole genome sequencing was performed on a total of 157 strains, with depth of coverage 339 ranging between 21x and 65x (average = 38x). A minimum of two alleles for each of 75 340 complementation groups were sequenced and a total of 58 essential genes were identified 341 ( approach. Eight of the nine genes represented in this blind test set were correctly identified by 347 our pipeline, whereas one gene escaped identification. This was due to an intronic mutation that 348 did not pass our filtering criteria but was found upon manual inspection of the sequencing data. 349 While the list of 58 genes includes many known essential genes, among the known genes are 350 alleles that are novel genetic variants. Nineteen genes from this collection which were not 351 previously studied or were not represented by lethal or sterile mutants were designated Genes 352 of Interest (GOI; Table 3). These 19 GOI, represented by 40 alleles, were further characterized as 353 part of this study. They include 14 genes (28 alleles) with a maternal-effect lethal phenotype and 354 After isolation, the mutant alleles were each localized to a chromosomal region through 358 deficiency mapping. This data was used to corroborate the candidate gene identities derived 359 from WGS analysis and to resolve complementation groups with more than one gene candidate. 360 For the majority of complementation groups, the genomic position of the assigned gene was in 361 agreement with the deficiency genetic mapping data ( Figure 1). 362 363 There were some conflicts between the deficiency mapping data and the gene candidates 364 proposed through WGS analysis. Three complementation groups that were found to not map 365 under any of the tested deficiencies were assigned gene candidates whose genomic coordinates 366 fall into regions covered by the tested deficiencies (alleles of bckd-1A, top-3, and unc-112; Figure  367 1). In addition, two of these groups were assigned the same gene identity as another, 368 purportedly distinct, complementation group (Table 4). From WGS analysis, bckd-1A was the 369 initial gene candidate for two different complementation groups, yet only one of these groups 370 had been mapped to a deletion (tDf5) that covers the bckd-1A locus. Similarly, top-3 was the 371 assigned gene candidate for three different complementation groups, only one of which was 372 mapped under a deficiency (tDf5) encompassing that gene. By performing complementation 373 tests with select alleles (Table 4) Table S1) 387 388 Human orthologs, gene ontology, and expression patterns 389 Of the 58 essential genes identified, 47 genes have predicted human orthologs (Table 2) The 40 alleles associated with the 19 GOI were further examined to gain insight into the 419 phenotypic consequences of their mutations. Each allele was assayed for temperature 420 sensitivity, as some of the original mutant screening was carried out at 25°C. Five alleles (marked 421 with a [ts] phenotype in Table 3) were deemed temperature sensitive and could proliferate as homozygotes at a permissive temperature of 15°C, while being maternal-effect lethal or sterile 423 at a restrictive temperature of 25°C. Curiously, four of these temperature sensitive alleles were 424 the results of stop codons, not missense mutations. 425 426 Seven candidate genes (16 alleles) were hypothesized to be involved in male fertility, based on 427 the production of unfertilized oocytes by hermaphrodites and/or predominantly male gene 428 expression patterns. These 16 strains were assayed for their ability to be rescued through mating 429 with wild-type males. 14 of the strains were rescued by the mating assay, while two strains failed 430 to rescue (Table 5) after incubation in distilled water, while three additional strains had only some embryos that 440 exhibited this phenotype (Table 3). The OID phenotype was evident in embryos that filled the 441 eggshell completely (for example, dgtr-1(t2043), Figure 3A) and eggs that burst in their 442 hypotonic surroundings. Early embryonic arrest was observed in embryos from the two dlat-1 443 mutant strains (t2035 and t2056), which arrested most often with only one to four cells (for example, Figure 3B). Eleven strains had embryos that terminated with approximately 100-200 445 cells (for example, ZK688.9(t1433), Figure 3C); while four strains developed into two-or three-446 fold stage embryos that did not hatch and exhibited clear morphological defects, such as nstp-447 2(t1835) with a lumpy body wall and constricted nose tip ( Figure 3D). Revisiting legacy mutant collections with whole genome sequencing 456 In this study, we focused on reexamining legacy collections of C. elegans mutants isolated before 457 the complete genome sequence was published (The C. elegans Sequencing Consortium 1998) 458 and long before massively parallel sequencing was widely available. With major advances in 459 sequencing technology in the past 30 years (reviewed in Goodwin et al. 2016), WGS has become 460 affordable and accessible, making it possible to revisit past projects with new approaches and 461 advanced capabilities. We have sequenced paired alleles from 75 complementation groups on 462 chromosomes III, IV, and V, from which we identified 58 essential genes (Table 2). 463 464 While WGS is a powerful tool, it does not stand alone as a solution to identifying mutant alleles. 465 This study has shown the power of having multiple alleles in a complementation group when 466 faced with the abundance of genomic variants found in WGS analysis. Indeed, when we 467 sequenced four single alleles, which had no complementation pairs, we were unable to 468 designate a single mutation as the variant responsible for maternal-effect lethality (data not 469 shown). Our approach to gene identification proved to be effective and was validated by a 470 combination of different methods. The blind test set of 17 previously sequenced alleles from 471 which eight of nine genes were readily identified serves as an important validation of our 472 analysis pipeline and gives confidence in the results we obtained. In addition, the deficiency 473 mapping data, gene expression patterns from the modENCODE project, GO term analysis, and 474 phenotypes documented from previous experiments provide evidence to support the gene 475 identities we assigned in these mutant collections.

477
The CRISPR-Cas9 deletion alleles we generated for selected gene candidates provide additional 478 validation and will be made available to the research community to serve as useful tools for 479 future studies. While the mutant alleles from the original study have been outcrossed, the 480 genetic balancer background and additional mutations that persist can complicate phenotypic 481 analysis. In contrast, these new CRISPR-Cas9 deletion strains were made in a wild-type 482 background, which makes it much easier to handle them and interpret their mutant phenotypes. were unable to identify will require repeating complementation tests and re-tooling the analysis 491 approach. 492 collection lacked sufficient annotation to be interpreted this way. We found four genes about 499 which there is little to nothing known (D2096.12, F56D5.2, T22B11.1, and Y54G2A.73). For 500 example, F56D5.2 is a gene with no associated GO terms, no known protein domains, and no 501 orthologs in other model organisms. These wholly uncharacterized genes are intriguing 502 candidates which may help uncover new biological processes and biochemical pathways that are 503 evidently fundamental to life for this organism. 504 505 Examining expression patterns leads to discovery of genes involved in male fertility 506 The life stage-specific expression patterns (Supplementary Appendix S2) provide some insight 507 into the roles the genes in this collection play in development. 15 of the 19 GOI are highly 508 expressed in the early embryo and hermaphrodite gonad, which suggests that the gene product 509 is passed on to the embryo from the parent. Five of these maternal genes also have elevated 510 expression during late embryonic and larval stages, which suggests they are pleiotropic. The 511 zygotic functions of these genes must be non-essential or else a zygotic lethal, rather than 512 maternal-effect lethal, phenotype would be observed. 513

514
We also identified four genes that are most highly expressed in males and L4 hermaphrodites, as 515 well as three genes that have prominent male expression in addition to characteristic maternal 516 expression patterns. Mating assays confirmed that these male-expressed genes have an essential 517 role in male fertility. Studies have shown that genes expressed in sperm are largely insensitive to genes that remain to be discovered. 525

526
We propose that the seven male-expressed genes are involved in sperm production and/or 527 function (see Table 5). These genes are mostly uncharacterized, and this is the first reporting of 528 their involvement in male fertility. While the mutant hermaphrodites lay unfertilized oocytes (5 529 genes) or dead eggs (2 genes), this phenotype could be rescued in 14 of the 16 alleles by the 530 introduction of wild-type sperm through mating. The two alleles that could not be rescued had 531 allele pairs in the same complementation groups that were rescued in the mating assay. One of 532 these discrepancies, between F56D5.2(t1744) and F56D5.2(t1791), was resolved when we found 533 a second mutation in a nearby essential gene that was likely responsible for the inability of one 534 strain to be rescued (data not shown). The presence of additional lethal mutations in the 535 genome is unsurprising given the nature of chemical mutagenesis, and it reinforces the 536 advantage of having multiple alleles for a gene when interpreting mutant phenotypes. alleles of dlat-1 in this study (t2035 and t2056) in which most embryos arrest at the one-to four-545 cell stage ( Figure 3B). The mutant alleles presented here can confirm previously reported 546 phenotypes and serve as new genetic tools for continuing the study of essential gene function. 547

548
We also identified alleles for six genes that exhibit an osmotic integrity defective (OID) 549 phenotype, resulting in embryos that filled the eggshell completely or burst in distilled water. Most of the mutant strains we examined with DIC microscopy arrested around the 100-to 200-568 cell stage as a seemingly disorganized group of cells (for example, Figure 3C). Others developed 569 into two-fold or later stage embryos that moved inside the eggshell but did not hatch (for 570 example, Figure 3D It is our hope that the alleles and phenotypes presented here will serve as a starting point and 603 guide future research to elucidate the specific roles these genes play in embryogenesis. All of the 604 alleles presented in this study are available to the research community through the 605 Caenorhabditis Genetics Center (cgc.umn.edu) and we anticipate they will serve as a valuable 606 resource in the years to come. The wealth of material uncovered in this specific legacy collection 607 will hopefully inspire similar explorations of other frozen mutant collections.

610 611
The authors thank Mark L. Edgley for advice and help with strain maintenance, as well as Negin 612 Khosravi, who replicated some of the nematode assays and conducted PCR assays with 613 F56D5.2(t1744) to reveal an additional mutation in a nearby an essential gene. This work was 614 supported by a CIHR Canada Graduate Scholarship-Master's (awarded to EL) and CIHR grant PJT-615 148549 (awarded to DGM). This work was also supported by a grant from NSERC to DGM and an 616