Abstract
Infertility is a highly heterogeneous condition, with genetic causes estimated to be involved in approximately half of the cases. High-throughput sequencing (HTS) approaches are becoming an increasingly important tool for genetic diagnosis of diseases, including idiopathic infertility. Nevertheless, most rare or minor alleles revealed by HTS are classified as variants of uncertain significance (VUS). Interpreting the functional impacts of VUS is challenging but profoundly important for clinical management and genetic counseling. To determine the consequences of segregating polymorphisms in key fertility genes, we functionally evaluated 8 missense variants in the genes ANKRD31, BRDT, DMC1, EXOI, FKBP6, MSH4 and SEPT12 by generating genome-edited mouse models. Six variants were classified as deleterious by most functional prediction algorithms, and two disrupted a protein-protein interaction in the yeast 2 hybrid assay. Even though these genes are known to be essential for normal meiosis or spermiogenesis in mice, none of the tested human variants compromised fertility or gametogenesis in the mouse models. These results should be useful for genetic diagnoses of infertile patients, but they also underscore the need for more effective VUS categorization. To this end, we evaluated the performance of 10 widely used pathogenicity prediction algorithms in classifying missense variants within fertility-related genes from two sources: 1) the ClinVar database, and 2) those functionally tested in mice. We found that all the algorithms performed poorly in terms of predicting phenotypes of mouse-modeled variants. These studies highlight the importance of functional validation of potential infertility-causing genetic variants.
Introduction
A major challenge in human genetics is to elucidate VUS residing within disease-associated genes. In the absence of experimental data that test functional consequences of such variants (namely single nucleotide polymorphisms, or SNPs), or phenotypic data on individuals and control groups bearing VUS, functional prediction algorithms are commonly used. However, the accuracy of these predictors is not sufficiently reliable for basing clinical decisions without corroborating information 1. Compounding our relatively weak knowledge of human infertility genetics is the large number of genes that are required for normal fertility 2. Thus, even for infertile individuals who have undergone whole genome or exome sequencing (WGS and WES, respectively), an actual causal SNP or private mutation will exist within a background of VUSs in candidate genes, making it difficult to conclusively implicate any single variant as being responsible for infertility. Alternatively, for all but the rarest SNPs, insights into their potential effects can be gleaned from genome-wide association studies (GWAS) designed to dissect genetic contributions to disease or phenotypic states. Unfortunately, there has been very limited success using GWAS for mapping infertility alleles, an outcome exacerbated by the likelihood that common variants are not responsible in most cases 3.
CRISPR/Cas9-mediated genome editing provides a means for evaluating potential human disease variants in an appropriate in vivo system. We have developed an integrated computational and experimental approach to functionally assess potentially deleterious missense variants in essential reproductive genes by modeling them in mice4. Our original approach taken for selecting candidate infertility missense variants for modeling was based on the following criteria: a) They reside in genes that are essential for fertility in mice; b) they alter an amino acid conserved between mice and humans; and c) the amino acid change is predicted to be deleterious by various bioinformatic tools. Using this approach, we tested several missense variants in genes important for meiosis, finding that a substantial fraction had no clear impact on fertility 5–8. To increase the likelihood that missense variants selected for mouse modeling impact fertility, we added two in vitro pre-screens to our selection pipeline 9–11. The first pre-screen was to prioritize VUS that disrupt a known protein-protein interaction (PPI), because human disease mutations are overrepresented for such variants 12, and the second was to assess protein stability defects in cultured cells 10.
Recommendations by the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP) have been widely adopted by clinical laboratories around the world to guide clinical interpretation of sequence variants 13.These guidelines provided criteria for classifying variants as pathogenic (P), likely pathogenic (LP), VUS, likely benign (LB), or benign (B). These classifications are based on distinct evidence types, each of which is assigned a level of strength. The Clinical Genome Resource (ClinGen) approves self-organized Variant Curation Expert Panels (VCEPs) in specific disease areas, e.g., hearing loss and cardiomyopathy, to make gene-centric specifications to the ACMG/AMP guidelines. The VCEPs provide disease-specific rules for accurate variant interpretation and classify variants specified for the genes and/or diseases within their scope. As a ClinGen partner, the ClinVar database archives genetic variants and assigns predicted or known impacts to diseases and other conditions 14. The collaboration of these two entities provides complementary resources to support genomic interpretation. Unfortunately, ClinGen does not have a reproductive- or infertility-related VCEP that would evaluate variants/mutations causing infertility, which impinges upon clinical action. Consequently, functional evidence from the literature becomes crucial. Given the lack of appropriate or ex vivo systems to evaluate VUS in genes involved in certain aspects reproduction such as spermatogenesis, and in the absence of well-characterized families segregating a suspect variant, in vivo evidence from knock-in/humanized animal models is currently one of the most rigorous ways to test the roles of VUS in fertility.
Here, we report experiments to test 8 missense variants in 7 genes crucial for various processes in meiosis or spermiogenesis. Six of these are known to play essential roles in various steps of meiotic recombination or meiotic chromosome organization (Ankrd31, Brdt, Dmc1, Exo1, Fkbp6 and Msh4), and the other (Sept12) is important for sperm morphogenesis and motility. We performed functional interpretation by constructing and analyzing mouse models containing the orthologous amino acid substitutions. Despite in silico predictions or Y2H screening indicating that these variants are harmful, we found that mice bearing these variants were fertile. Our assessment of these and other functionally interpreted alleles in mouse models, as well as infertility missense variants in the ClinVar database, reveals that in vivo assays are crucial even when informed by computational predictions and in vitro assays.
Materials and Methods
Variant prioritization
The possible pathogenic functional effects of missense variants in human DMC1, FKBP6, EXO1, MSH4, BRDT and ANKRD31 were analyzed using SIFT 15 and PolyPhen-2 16. The frequency of variants is as reported in Genome Aggregation Database (gnomAD v2.1.1 for FKBP6, EXO1, MSH4, BRDT and ANKRD31, and v3.1.1 for DMC1) (gnomeAD.broadinstitute.org) as of April 2021. The domain structures of human proteins were drawn using Domain Graph 17.
Y2H screening
Y2H screening for SEPT12 variants was performed as previously described 9. Disrupted PPIs were identified by the following criteria: a) the mutated protein reduced growth by at least 50% relative to WT as benchmarked by twofold serial dilution experiments; b) neither WT or mutant DB-ORFs were autoactivators; and c) the reduced growth phenotype was reproduced across 3 replicates. A mutation was scored as disruptive if one or more corresponding PPIs were affected.
Generation of mouse models by CRISPR/Cas9
All animal usage was approved by Cornell University’s Institutional Animal Care and Use Committee, under protocol 2004-0038 to J.C.S. Mouse models were generated by the CRISPR/Cas9 genome editing technology, as described previously 5. sgRNAs and ssODNs are listed in Table S1. In addition to amino acid substitutions, several silent mutations were introduced to prevent the mutated region from being recognized and recut by the ribonucleoprotein. Briefly, the sgRNA, ssODN, and Cas9 mRNA (25 ng/μL, TriLink) were co-injected into zygotes (F1 hybrids between strains FVB/NJ and B6(Cg)-Tyrc-2J/J) then transferred into the oviducts of pseudopregnant females. Dmc1, Fkbp6, Exo1, Msh4, Brdt and Ankrd31 founders carrying at least one copy of the desired alteration were identified and backcrossed into C57BL/6J (or FvB/NJ in the case of Sept12 mutants) for at least three generations. For genotyping, 4 μL of crude DNA lysate was created as described previously 18 from ear punch biopsy specimens or toes of 7- to 14-day-old mice. The primers used for genotyping the mutant mice are listed in Table S1. To distinguish between WT and SNP alleles, PCR amplicons were analyzed via Sanger sequencing or restriction enzyme digestion.
Fertility testing
Two-month-old heterozygous and homozygous mice were mated with age-matched C57BL/6J (for Dmc1, Fkbp6, Exo1, Msh4, Brdt and Ankrd31 mice) or FVB/NJ (for Sept12 mice) until 7–15 months of age. The number of pups from each mating set and sex was recorded. After at least 3 months without offspring, the analyzed individuals were considered infertile, then sacrificed for more detailed reproductive evaluations as described in the following sections.
Sperm counts
Cauda epididymis sperm counting was performed using a standard method 19. One cauda epididymis per male was used for each data point.
Histology
For the preparation of paraffin blocks, tissues were fixed overnight in Bouin’s solution at room temperature, washed in 70% ethanol and then dehydrated and embedded in paraffin as previously described 5. Six-micron sections were deparaffinized and stained by hematoxylin and eosin (H&E).
Terminal deoxynucleotidyl transferase dUTP nick end labeling (TUNEL) staining
Testes were fixed in 4% paraformaldehyde and embedded in paraffin. Sections were deparaffinized and performed TUNEL staining using the DeadEnd™ Fluorometric TUNEL System (Promega, G3250) following manufacturer’s instructions.
Follicle counts
Ovaries were collected from 3- and 12-wk-old Ankrd31 mutant females and fixed in Bouin’s solution, embedded in paraffin, serially sectioned at 6 μm, and stained by H&E. Follicle quantification was performed as described previously 20. Every fifth section was examined for the presence of the following classes of oocytes/follicles: primordial, primary, secondary, preantral and antral. For quantitative analyses, they were grouped into primordial and growing follicles (from primary to antral).
Meiotic spread preparation and immunostaining
Meiotic prophase I surface spreads were prepared as described previously 21. Briefly, testes were removed and decapsulated in hypotonic sucrose extraction buffer and left on ice for 1 h. Tubules were chopped on glass depression slides in a bubble of 0.03% sucrose and added to slides coated in 1% paraformaldehyde. Slides were slow dried and washed in PBS with Photo-Flo (Kodak).
For staining, the slides were incubated for 1 h at RT with sterile filtered antibody dilution buffer (ADB) containing 10 mL normal goat serum, 3 g BSA, 50 μL Triton X-100 and 990 mL PBS, and subsequently incubated overnight with ADB diluted rabbit polyclonal H3K9me3 (Millipore Sigma; 07-442, 1:100) and mouse monoclonal anti-Sycp3 (Abcam; ab12452, 1:200) in a humidified chamber at 4 °C. After washing in ADB, the slides were incubated at 37 °C for 30 min with a 1:1000 dilution of secondary antibody goat anti-rabbit IgG 594 (Molecular Probes, A11012) and goat anti-mouse IgG 488 (Molecular Probes, A11001), then incubated at 37 °C for 5 min with 500 ng/mL of DAPI (4′,6-diamidino-2-phenylindole) and mounted using Vectashield antifade mounting medium (Vector, H-1000).
Diakinesis chromosome spreading
This procedure was described previously 22. Briefly, seminiferous tubules were treated by hypotonic solution (1% sodium citrate) for 30 min and minced carefully for 15 min. The cell suspension was centrifuged, and the supernatant was removed. Cells were then fixed three times in methanol: glacial acetic acid (3:1) and dropped onto slides. After drying, the cells were stained with Giemsa staining solution.
Data preprocessing
The infertility-related variants were downloaded using the ClinVar Miner web-based tool (version: 2021-5-29) 23. Variants/mutants under terms “female infertility”, “genetic infertility”, “infertility disorder” and “male infertility” with “one-star criteria provided” were selected. The variants were organized into five catalogs (pathogenic, likely pathogenic, VUS, likely benign and benign) and only missense variants with a SNP rsID were selected. Sets of deleterious and benign missense variants that have been functionally interpreted in mouse models were collected from our experiments (http://www.infertilitygenetics.org/) and from searching the literature using PubMed, Google Scholar, bioRxiv and medRxiv (“Mouse_all” dataset, n=102 in total, Table S2), in which the infertility variants were grouped to form the “Mouse_infertility” dataset (n=34). The analysis was restricted to SNPs with rsIDs reported in the gnomAD database. The ClinVar annotation and star rating were also collected in Table S2.
Predictor calls of missense variants from the two datasets were derived using dbNSFP v4.1a 24, which is integrated into the Ensembl Variant Effect Predictor (VEP). Only the “pathogenic” and “benign” variants in ClinVar were grouped into the “ClinVar_infertility” dataset (n=697). The cutoff scores for predictors are listed in Table S3.
Evaluation of pathogenicity predictor performances
To compare the performance of the prediction tools, we applied statistical metrics derived from a confusion matrix. We identified a correctly classified variant as a true positive (TP) if, and only if, the variant corresponded to the positive class (deleterious) and as a true negative (TN) if and only if the variant corresponded to the negative class (benign). Accordingly, a false positive (FP) is a negative variant (benign) that is classified as positive (deleterious) and a false negative (FN) is a positive variant (deleterious) classified as a negative one (benign).
The Matthews Correlation Coefficient (MCC) was used to measure the correlation between the actual class of variants and the predictions made by the classifiers:
The value range of MCC is -1 to 1. A coefficient of 1 represents a perfect prediction, 0 is no better than random prediction, and −1 indicates total disagreement between prediction and observation.
Prediction precision, also known as positive predictive value (PPV), of the impact of missense variants were calculated by:
The prediction accuracy (ACC) of the impact of missense variants were evaluated using the following statistical measures:
The range of values of ACC lies between 0 and 1. A perfect tool has an ACC of 1.
We used Receiver Operating Characteristic (ROC) curves to visualize the tradeoffs between sensitivity and specificity in the binary classifiers. The ROC curve is the fraction of the TP over all positives TP+FN (sensitivity) against the fraction of the FP over all negatives TN+FP (1-specificity).
The area under the ROC curve, known as the AUC, was used to measure the performances of predictors in correctly identifying the TPs, i.e. deleterious variants, among the TPs and FPs. The AUC can take values between 0 and 1. A perfect tool has an AUC of 1 and the AUC random tool is 0.5. The prediction scores derived from dbNSFP v4.1a were used to calculate ROC in Statistical Package for the Social Sciences (SPSS) software.
Image collections and statistics
Fluorescent images were captured by an Olympus XM10 camera. Bright-field images were captured by the Olympus SC30 camera. Cropping, color, and contrast adjustments were made with Adobe Photoshop CC 2020, using identical background adjustments for all images. All data were expressed as mean ± SEM. All statistical calculations were carried out using an unpaired Student’s t-test or one-way analysis of variance (ANOVA) followed by Tukey’s post hoc test or two-sided Fisher’s exact test with R. Graphs were generated using R. ROC curve was generated by SPSS software (version 20.0; IBM).
Results
Selection of reproduction genes and variants evaluated in this study
The process of spermatogenesis comprises multiple tightly regulated steps of differentiation (Fig. S1). It begins with several mitotic divisions of cohorts of spermatogonial stem cells that ultimately enter meiotic prophase I. During the leptotene substage of prophase I, programmed double strand breaks (DSBs) are introduced at preferred sites throughout the genome. These DSBs are recognized by a number of DNA damage response proteins that ultimately repair the DSBs via homologous recombination, and this process drives pairing and synapsis of homologous chromosomes 25. Prophase I is followed by two cell divisions without intervening DNA replication, leading to haploid, round spermatids that subsequently undergo a dramatic morphological transformation into spermatozoa during the differentiation process of spermiogenesis.
We evaluated variants in the following genes that are already known to be essential for fertility: ANKRD31 (MIM: 618423), BRDT (MIM: 602144), EXO1 (MIM: 606063), FKBP6 (MIM: 604839), DMC1 (MIM: 602721), SEPT12 (MIM: 611562), and MSH4 (MIM: 602105). Fig. S1 illustrates the stages of mouse spermatogenesis during which they function, and below, a brief description of the key functions of these genes in spermatogenesis is provided. To select potentially deleterious missense variants for functional evaluation in mice, we considered allele frequency (AF), and either in silico prediction of pathogenicity or a genetic assay for protein interaction (Fig. 1A). We prioritized missense SNPs (present in the gnomAD database) segregating in human populations at an AF below 1%. Except for the SEPT12 alleles, SIFT and PolyPhen2 pathogenicity prediction algorithms were used to identify candidate deleterious variants. To further narrow candidates to a manageable number, we limited variants to those that also had an AF above 0.01%, yielding 19 putatively deleterious missense SNPs in 5 genes (FKBP6, EXO1, MSH4, BRDT and ANKRD31) (Fig.1B).
A) Overview of the pipeline used for filtering human SNP databases. For DMC1, rare variants with MAF (minor allele frequency) less than 0.01% were applied for in silico predictions, and rs189722264 was selected for mouse modeling. For SEPT12, rare variants (0.01%<MAF<1%) were applied for Y2H screening before mouse modeling. B) Distribution of benign and deleterious nsSNPs (0.01%<MAF<1%) predicted by SIFT and PolyPhen2 in indicated reproduction genes. C) Y2H screening of nsSNPs in SEPT12 disrupting its interaction with the SEPT1 and SEPT5 proteins, respectively. N.A., not analyzed.
Mutations that disrupt protein-protein interactions (PPIs) are enriched as a molecular cause of human genetic diseases 12. To explore if PPI disruption might be an effective molecular screen for potential pathogenic infertility variants, yeast 2 hybrid (Y2H) assays were performed on SEPT12 variants to determine if the altered amino acids disrupted known interactions with other Septins, SEPT1 and SEPT5 26–28. We screened 5 variants, 4 of which disrupted a PPI (Fig. 1C). One variant, p.Gly169Glu encoded by rs138628476, was reported by our groups to cause male subfertility in a mouse model 9. The other 3 PPI-disrupting variants were added to our candidate list.
Phenotypic analysis of mouse models
Based on our variant prioritization, Y2H screening, and resource capacity, we chose 9 missense variants for functional testing by mouse modeling (Table 1). This includes 1 variant in DMC1 that met criteria described above, except for having a lower AF than 0.01% cutoff. Using CRISPR/Cas9-mediated genome editing in zygotes, we successfully generated 8 mouse lines modeling their corresponding human amino acid variants (Fig. S2). To identify any potential reproduction defects, the animals were bred to homozygosity and tested for several fertility parameters as described previously 5–10, including testis weight, gonad histology, and sperm counts for males. For genes with known roles in meiosis, we also analyzed spermatocyte surface spread preparations of prophase I chromosomes via immunolabeling for key proteins. We also performed fertility trials of males and females in all cases. A brief description of the genes studied, and the phenotypes of mouse models, are presented below.
DMC1 (DNA Meiotic Recombinase 1) is a meiosis-specific homolog of E. coli RecA and is expressed in leptotene-to-zygotene spermatocytes, stages corresponding to initiation of homologous chromosome pairing. Following generation and exonucleolytic processing of meiotic DSBs to expose 3’ overhangs bound by single stranded DNA binding protein RPA, DMC1 and a related recombinase RAD51 catalyze strand exchange, promoting homolog recognition and pairing. Male and female mouse mutants null for Dmc1 are sterile as a consequence of failed DSB repair and interhomolog synapsis, both of which trigger prophase I arrest and checkpoint-mediated elimination of meiocytes 29, 30. We generated mice modeling the missense SNP rs189722264 (g.8500G>T; p.Gly50Cys) (Fig. S2A), which was predicted to be highly deleterious by both SIFT and PolyPhen2 (Table 1). In contrast to mice bearing a frameshift (presumably null) allele generated in the same editing experiment, which were azoospermic as expected, Dmc1G50C/G50C female and male mice had normal fecundity (Fig. 2A). Mutant males had testis sizes and sperm counts similar to WT sibs (Fig. 2B and C), and histology revealed normal spermatogenesis, unlike the maturation arrest observed in Dmc1-/-mice (Fig. 2D).
A) Litter sizes from mating of Dmc1G50C/G50C (GC/GC), Fkbp6F72L/F72L (FL/FL), Exo1S465Y/S465Y (SY/SY) and Msh4T410M/T410M (TM/TM) females and males to WT partners. N=2 for GC/GC, FL/FL and SY/SY male and female mice, N=3 for TM/TM male and female mice. B) Testis weights of 2-month old mice. C) Sperm counts in mutant homozygotes and littermate controls. D) Cross-sections of testes showing comparable histology to the littermate controls. Scale bars = 100 μm. Data in A, B and C are represented as the mean ± SEM and were analyzed using two-tailed unpaired t test.
FKBP6 (FK506 binding protein 6) encodes a cis-trans peptidyl-prolyl isomerase, in which resides the amino acid altered by rs3750075 (g.6245C>A; p.Phe72Leu) (Fig. S3). This gene functions in immunoregulation and basic cellular processes involving protein folding and trafficking. Expressed in several tissues 31, FKBP6 localizes to meiotic chromosome cores and regions of homologous chromosome synapsis 32. Despite its expression in both testis and ovary, its deficiency only causes male-specific infertility32, 33. FKBP6 variants have been associated with non-obstructive azoospermia and idiopathic infertility 34, 35. Fertility tests of Fkbp6F72L homozygotes revealed normal fecundity (Fig. 2A), testis weights, sperm counts, and testis histology (Fig. 2B-D).
EXO1 (Exonuclease 1) is involved in repair of DNA mismatches and DSBs. Loss of Exo1 causes cancer predisposition and infertility in mice 36, the latter related to premature meiotic homolog separation before alignment at the metaphase plate 37. The SNP rs4149964 changes an AA residue (g.28941C>A; p.Ser465Tyr) in the MLH1 interaction domain (Fig. S3). Exo1S465Y/S465Y males and females showed normal fecundity (Fig. 2A), and all male parameters were normal (Fig. 2B-D).
MSH4 (MutS protein homolog 4) is a mismatch repair protein essential for meiotic recombination. Male and female mice lacking Msh4 are infertile due to meiotic failure 38. We modeled rs141042002 (g.56405C>T; p.Thr410Met), which is located in the DNA-binding domain (Fig. S3). The Msh4T410M/T410M mutants did not reveal any reproductive phenotypes in males or females (Fig. 2A-D).
BRDT (bromodomain testis associated) belongs to the bromodomain-extra terminal (BET) family of proteins. It is expressed in pachytene spermatocytes (meiocytes in which chromosomes are fully synapsed) and round spermatids. It contains two conserved bromodomains (BD1 and 2) involved in the recognition of acetylated histones, and an extra terminal (ET) motif region involved in PPIs 39–41 . Mice bearing a deletion of BD1 have oligoasthenoteratozoospermia 42. A null allele is more severe, causing complete meiotic arrest 43. As an epigenetic regulator, BRDT is required for proper meiotic chromatin organization, meiotic sex chromosome inactivation (MSCI), and normal crossover (CO) formation 44.
We modeled the human BRDT p.Asp550His allele, encoded by SNP rs141699970, which resides in the ET domain (Figs. S2E and S3). A mouse line with a 62 nt deletion was also recovered and expanded to serve as a knockout allele (Brdt-) for comparison (Fig. S4). As previously reported 43, Brdt-/-males were infertile, however, BrdtD550H/D550H males sired litters of sizes comparable to those of heterozygous littermates when mated to WT females (Fig. 3A), and also had normal testis weights, sperm numbers and testis histology, in contrast to the null (Fig. 3B-D). Both Brdt-/-and BrdtD550H/D550H females were fertile (Fig. 3A), which was expected since Brdt is expressed exclusively in the testis 45. TUNEL staining revealed more apoptotic cells in the seminiferous tubules of BrdtD550H/D550H mice than controls but much less than in Brdt-/-mice (Fig. 3E and F). Depletion of BRDT results in weaker H3K9me3 signal in the pseudoautosomal region (PAR) of the sex chromosomes in diplonema 44. We confirmed this in Brdt-/-spermatocytes but did not observe a difference between BrdtD550H/D550H and control spermatocytes (Fig. 3G). However, BrdtD550H/- mice exhibited slightly reduced testis weights, presence of metaphase I arrested spermatocytes, and elevated apoptotic spermatogenic cells despite having comparable sperm counts and normal H3K9me3 modification in the PAR (Fig. 3B-G). Therefore, we classify BRDTD550H as benign for fertility, although it may be a weak hypomorph.
A) Litter sizes from mating of BrdtD550H/D550H (DH/DH), BrdtD550H/+ (DH/+), Brdt+/- (+/-) and Brdt-/- (-/-) animals to WT partners. N=2 for DH/DH male and female mice, N=3 for -/-male and female mice. B) Testis weights of 2-month-old males. C) Epididymal sperm counts (Methods). D) Histological sections of 2-month-old testes and cauda epididymides. Round germ cells were present in -/- epididymides. Scale bars = 100 μm. Black arrowheads in inset indicate meiotic metaphase I arrested cells. E) Quantification of TUNEL+ cells in each tubule. F) Number of tubules containing > 4 TUNEL+ cells. No control (DH/+) tubules section had >4. N=3 mice. G) Immunolocalization of H3K9me3 and SYCP3 on meiotic prophase I chromosomes in indicated spermatocytes stages. Dashed boxes highlight XY bodies. Data in A, B and C are represented as the mean ± SEM and were analyzed using one-way analysis of variance (ANOVA) with Tukey’s post hoc test.
ANKRD31 (Ankyrin repeat domain 31) contains two separated triplets of Ankyrin repeats, and three conserved domains: a predicted coiled-coil domain and two regions without functional predictions 46. Male mice lacking Ankrd31 are sterile due to delayed recombination initiation, altered DSB distribution, and failed recombination in the PAR of sex chromosomes, where it is needed for DSB formation 46–48 . Ankrd31 null female mice are fertile but exhibit premature ovarian insufficiency (POI) 46, 47. We modeled human rs150791065 (g.72881A>G; p.Thr557Ala; Fig. S2F), which is located within a ANK1 domain (Figs. S4). Fertility test demonstrated that Ankrd31T557A/T557A mice of both sexes had normal fecundity (Fig. 4A), and homozygous males also had normal testes and sperm counts (Fig. 4B-D). Nevertheless, since Ankrd31 is essential for recombination between sex chromosomes 46, we next examined metaphase chromosomes for this possible defect in the mutants. Chiasmata were present between each pair of homologs and between sex chromosomes in the Ankrd31T557A/T557A spermatocytes (Fig. 4E). Finally, because null females are fertile despite having a reduced primordial follicle pool 46, 47, we quantified follicle numbers in 3- and 12-wk-old mutant ovaries but found no difference compared to controls (Fig. 4F and G).
A) Litter sizes from matings of Ankrd31T557A/T557A (TA/TA) and Ankrd31T557A/+ (TA/+) animals to WT partners. N=3 for TA/TA male and female mice. B) Testis weights of 2-month-old mice. C) Sperm counts. D) Histological analyses of 2-mon old testes and cauda epididymides. Scale bars = 100 μm. E) Meiotic metaphase I spermatocyte chromosome preparations. Arrows indicate X and Y chromosomes. F) Histological analyses of 2-month-old ovaries. Scale bars = 100 μm. G) Follicle counts summed across every fifth serial section. Data in A, B, C and G are represented as the mean ± SEM and were analyzed using two-tailed unpaired t test.
SEPT12 belongs to the Septin GTP-binding protein family and is exclusively expressed in testis. It localizes around the manchette and the neck region of elongated spermatids, as well as the annulus of mature sperm. Missense mutations in SEPT12 have been implicated as causing sperm defects and infertility in men 49. The phenotype of mice lacking Sept12 is not clear. The first reported attempt to create mouse mutants did not obtain germline transmission, but most chimeras bearing a null allele proved to be infertile, showing a variety of testicular phenotypes 50. It is uncertain whether these phenotypes were related to haploinsufficiency for Sept12, or unrelated defects in the embryonic stem cells in which the targeting was performed. However, mice bearing a phosphomimetic allele caused subfertility associated with decreased sperm count and sperm motility 50.
The humanized alleles described thus far were selected through algorithmic predictions classifying them as likely pathogenic with respect to protein function. Because none had any effect on fertility, we attempted to increase the likelihood that selected alleles would be consequential by first determining if they disrupted a known PPI. Accordingly, 4 alleles of SEPT12 were selected with AF < 1% and that also disrupted at least 1 PPI. We previously performed mouse modeling of one such allele (SEPT12G169E/G169E) that indeed caused subfertility and poor motility 9. We failed to generate a mouse model for another (Table 1), but did generate and characterize two mouse alleles, Sept12V160M and Sept12V220I, corresponding to human rs144420035 (g.9563G>A) and rs142721632 (g.9998G>A), respectively. Both mutations are located within a GTPase domain (Fig. S3). Males homozygous for both alleles were fertile with normal fecundity (Fig. 5A), testis weights, sperm counts, and testis histology (Fig. 5B-D). Thus, only 1 of 3 with a PPI disruption proved to be pathogenic.
A) Litter sizes from mating of WT (+/+), Sept12V162M/V162M (VM/VM) and Sept12V222I/V222I (VI/VI) males to WT partners. N=2 for VM/VM and VI/VI mice. B) Testis weight of 2-month-old mice. C) Sperm counts. D) Histological analyses of 2-month-old testes. Scale bars = 100 μm. Data in A, B and C are represented as the mean ± SEM and were analyzed using one-way ANOVA with Tukey’s post hoc test.
Performance of pathogenicity prediction algorithms in identifying disease-causing variants
To explore why such a high fraction of predicted deleterious variants failed to cause the reproductive phenotypes in mice, we evaluated the effectiveness of 10 different pathogenicity prediction algorithms in classifying 26 human alleles (including those presented here) that our group has modeled in mice, corresponding to 15 essential fertility genes 5–10. We use the terms ’deleterious’ and ’benign’ to refer to variants that are or are not associated with disease, respectively.
Unsurprisingly, the predictors performed differently, as they use different underlying criteria. 24 of the 26 alleles were classified as deleterious by both SIFT and PolyPhen, but only 12 caused a phenotype in mice. The REVEL (Rare Exome Variant Ensemble Learner) 51 and CADD (Combined Annotation Dependent Depletion; a cutoff score of >25 was used) 52 algorithms performed best (Fig. 6A). In contrast to the low accuracy of computational predictions alone in predicting phenotypes, in vitro assays of variants that revealed protein defects (instability, altered function, or PPI disruptions) correctly predicted mouse phenotypes in 10/14 cases. Of the 12 alleles classified as deleterious by both PolyPhen and SIFT and for which no in vitro data was available, only 2 caused a mouse phenotype (Figs. 6A and S5).
A) In silico prediction outcomes of 10 commonly used algorithms for 26 functionally interpreted variants in mouse models. The in vitro experiments of DMC1_M200V variants were performed by Hikiba et.al., 2008. The variants below the dotted line all were tested by at least 1 in vitro pre-screen experiment. N.A., not available. B) Overview of how the Mouse_all and Mouse_infertility datasets were derived. For missense variants modeled in mice, only those with SNP rsIDs were considered. C) Prediction Accuracy (ACC), D) Positive Predictive Value (PPV) and E) Matthews correlation coefficient (MCC) of predictors using ClinVar_infertility (n=697, only pathogenic and benign variants were used), Mouse_all (n=102) and Mouse_infertility (n=34) datasets. F) ROC curves (Receiver Operating Characteristic curve) of 10 predictors in three datasets and the AUC (Area Under the Curve) values were labeled in brackets.
The disparity between the predicted and observed effects of these missense variants led us to further explore the better-performing algorithms. We built a dataset (“Mouse_all”) comprising 102 human missense variants that have been modeled in mice, and which reside in genes involved in reproduction or other processes (Materials and Methods; Fig. 6B; Table S2). This group was subdivided in two datasets (Mouse_all_D and Mouse_all_B, where D=deleterious and B=benign) based on whether the mouse models did or did not show a pathogenic phenotype, respectively (Fig. S6A). For a “control” dataset, we retrieved infertility-related missense variants from the ClinVar database that had been pre-categorized into five categories: ClinVar_B (benign), ClinVar_LB (likely benign), ClinVar_VUS, ClinVar_LP (likely pathogenic), and ClinVar_P (pathogenic) (Fig. S6A). For each of these datasets, we generated prediction scores from 10 algorithms computed by dbNSFP (database for nonsynonymous SNP functional predictions) 24. The prediction score distribution patterns of SIFT, DANN, GERP++ RS and MutationTaster were similar for all categories (Fig. S6B), indicating that these tools failed to discriminate the variants functionally. Next, we focused on the ClinVar_B and ClinVar_P datasets (n=697 combined, collectively referred to as “ClinVar_infertility”; Fig. S6A), representing the extremes of ClinVar classifications. Whereas the predictions of all the tools corresponded well with the pathogenic (ClinVar_P) dataset, only REVEL and CADD had relatively low FP (type I error) rates (7.95% and 8.3%, respectively) for the benign (ClinVar_B) variants. MutationTaster performed the worst (FP rate of 94.14%; Fig. S6C). Surprisingly, all of the prediction tools had higher FP rates (39.29% to 89.29%) for the Mouse_all_B group than those in ClinVar_B (infertility) group (Fig. S6C), consistent with a previous study showing that prediction tools have high FP values 53.
We next compared the performance of pathogenicity predictors for the Clinvar_infertility and Mouse_all datasets in terms of prediction accuracy (ACC), positive predictive value (PPV), Matthews Correlation Coefficient (MCC) and Area Under the Curve (AUC) of receiver operating characteristic (ROC). Consistent with previous comparisons using ClinVar variants 54, REVEL was superior to other predictors for the ClinVar_infertility dataset (ACC=0.92, PPV=0.88, MCC=0.83 and AUC=0.979; Fig. 6C-F). However, all the predictors performed worse for the Mouse_all vs the ClinVar_infertility dataset (Fig. 6C-F). The MCC scores for the Mouse_all dataset approached 0, indicating that the mouse phenotype and in silico interpretations are uncorrelated (Fig. 6E). We next focused on a “Mouse_infertility” data subset containing 34 functionally tested variants from the Mouse_all dataset (Fig. 6B). All the predictors performed similarly between Mouse_all and Mouse_infertility datasets (Fig. 6C-F).
To study why the predictors’ performances differ between the ClinVar_infertility (mostly in silico predicted variants) and Mouse_all datasets (in vivo tested variants), we analyzed correlations between variant classifications. We retrieved ClinVar annotations of the Mouse_all dataset and found that 65/102 variants (63.7%) were reported in ClinVar but only 30.4% (31/102) were classified concordantly (Fig. 7). This included 5 of 28 Mouse_all benign variants being classified as “Pathogenic/Likely Pathogenic” in ClinVar, and 12 deleterious variants being classified as “Benign/Likely Benign” (Fig. 7). With the caveat that variants may have different effects in human vs mouse, these results raise serious questions about clinical usage of ClinVar or in silico based classifications alone, and argues that in vivo functionally-interpreted variants are important for accurate classifications of the phenotypic consequences of genetic variants.
Variants modeled in mice (n=102) were classified as either Benign or Deleterious according to the reported phenotype description. These variants were correlated with ClinVar classifications (Benign/ Likely Benign, Conflicting interpretations, VUS and Pathogenic/ Likely Pathogenic). 37 of 102 variants (36.3%) were not reported in ClinVar (N.A., not available).
Discussion
WGS and WES are quickly becoming mainstream diagnostic tools for identifying genomic causes of rare diseases and cancers 55, 56. Nevertheless, wider adoption of sequencing as a diagnostic tool for other [potentially genetic] diseases is being limited by our ability to accurately classify and interpret the consequences of genetic variants. Many in silico prediction tools were developed to filter and prioritize variants as being potentially causative for disease on the basis of predicted impact on a protein’s function, but actual phenotype correlations are difficult to predict. Complicating factors include heterogeneity of molecular mechanisms underlying diseases, lack of a deep mechanistic understanding of most genes and proteins, and complexity of physiological systems. Here, we found that in silico prediction tools had high FP rates when used to predict whether a human missense variant causes a phenotypic outcome in mouse models. This shortcoming underscores the need to combine computational predictions with functional evidence from appropriate cellular or animal models in order to be useful for genetic diagnosis of disease origins.
In clinical settings, in silico predictions represent one evidentiary criterion for variant interpretation by the ACMG/AMP. However, exclusive dependence upon in silico tools can cause erroneous interpretation of variants. For example, SPATA16 p.Arg283Gln 57 and MEIOB p.Asn64Ile 58 were reported as causative variants of human globozoospermia and azoospermia, respectively, but were shown not to cause reproductive phenotypes in mice, even though the latter variant caused some meiotic defects 59, 60. Besides in silico predictors, biochemical and in vitro approaches are commonly used to evaluate the impact of variants on protein function. While these approaches help to prioritize or classify variants with greater confidence when used in conjunction with in silico predictions, the impact of such functional variants on reproductive organs or phenotype is lacking and unacceptable for use in the clinical setting as a sole diagnostic. The DMC1 homozygous variant p.Met200Val was reported to cause POI in an African woman 61, and a subsequent study reported that this allele impaired biochemical function 62. However, we found that the orthologous allele of this highly conserved gene did not impair mouse fertility 7. Here, we report that two alleles of SEPT12 did not markedly affect mouse reproduction despite disrupting interaction with other Septins in the Y2H assay. Our data emphasizes the importance of animal models in the functional validation of genetic variants involved in human infertility, and highlights challenges behind truly ascertaining the physiological effects of VUS.
We concluded that the BrdtD550H variant is non-pathogenic since mouse homozygotes had normal testis weight, sperm number and fertility despite the presence of increased apoptotic cells in the seminiferous tubules. Interestingly, we found that whereas mice bearing this allele in trans to a null (BrdtD550H/-) had sperm counts comparable to controls, they exhibited a subtle increase in apoptotic and meiotic metaphase I arrested spermatocytes. The mice we used were not entirely inbred, so it is possible that heterosis suppressed the phenotype. Nevertheless, there was no inconsistency amongst replicates in terms of fertility. These results suggest that functional prediction algorithms may correctly identify missense alleles that impact the protein in some subtle way that does not result in a pathogenic phenotype.
As part of a larger effort to identify true infertility variants segregating in people, we have used pathogenicity prediction algorithms as a screen for prioritizing missense variants. However, the work summarized here reveals the shortcomings of current predictors despite being claimed to have an accuracy ranging from ∼65% to 90% 63. Given that a variety of in silico algorithms/pipelines are employed by clinical labs for prioritizing potential disease-causing variants 64, it is important to identify high-confidence classifiers to minimize both false-positive and false-negative prediction rates. Here, we attempted to address this issue by evaluating the performance of predictors in classifying infertility-related missense variants deposited in ClinVar. The tools varied with respect to agreement with ClinVar annotations, presumably reflecting differences in feature sets and scoring methods employed by these tools. Usually, ensemble prediction tools or metapredictors (e.g. CADD and REVEL), which generate their predictions based on the output (scores) of other tools, are purported to have higher classification accuracies than individual tools (e.g. SIFT and PolyPhen2).
To estimate the performance of a predictor, commonly used parameters include ACC, Matthews Correlation Coefficient (MCC), sensitivity, specificity, negative predictive value, PPV, and AUC. Given that all have flaws, we took an alternative approach of considering multiple success rate descriptors simultaneously. Here, we compared variant classification performance using four metrics, MCC, ACC, PPV and AUC. Consistent with a previous study 54, the consensus results from each parameter demonstrated that REVEL outperformed other classifiers in the ClinVar_infertility dataset. However, it is important to recognize that ClinVar variants are sometimes used to train algorithms either directly (such as MutationTaster) or indirectly (e.g. REVEL, which was trained by HGMD and shares greater than half of ClinVar variants/mutants 65), and conversely, pathogenicity of some variants entered into ClinVar are classified in part by various prediction algorithms. Such circularities in testing and training data may compromise calls 66. Hence, in vivo tested missense variants, which are classified according to whether the animal model has a disease-causing phenotype or not, most likely constitute a more clinically accurate and relevant (albeit small) testing set.
Apart from these issues, there are other potential explanations for the poor performance of predictors in classifying alleles in the Mouse_all dataset compared to those in the ClinVar_infertility dataset. One is that a significant proportion of ClinVar classifications, which are often used for algorithm training, are inaccurate. Based on data from chemical mutagenesis studies in mice, Miosge et al. demonstrated that for de novo or rare missense mutants that were algorithmically predicted to be deleterious, nearly half were in fact neutral or nearly neutral by both phenotypic and in vitro assays53. Similarly, we found that all the predictors yielded high FP values (type I error). A study of human variants ClinVar database provided evidence for extensive misclassifications, suggested that substantial numbers of misclassifications could be corrected by considering allele frequency, and that input from orthogonal functional and genetic studies are crucial for improving variant classification accuracy 67. A second potential explanation is that there is a substantial disconnect between biochemical and pathogenic effects of variants. That is, living systems have robustness or redundancies that largely mask minor biochemical or structural defects of proteins, although subtle molecular defects can ultimately lead to allele frequency decline from purifying selection53. A third possible explanation is that the power of predictors was not sufficiently evaluated by the modest sample size of the Mouse_all dataset. With the development of genome editing approaches, knock-in/humanized animal models can be applied to interpret more missense variants and the results can be used to evaluate and improve the prediction tools. Finally, despite the evolutionary conservation of all the amino acids that were investigated here, it is possible that mice are more robust to alterations than humans.
Different levels of manual curation are applied in databases. ClinVar uses a three-star rating system to represent the “Review Status” of each submission. These are: “single submitter - criteria provided” (one star); “expert panel” (three stars); and “practice guidelines” (four stars). To assist users in discerning the exploring the data and evidence for variant interpretations, several ClinVar derivative platforms have been developed. We used ClinVar Miner to identify variants of interest for evaluating the performance of pathogenicity predictors, and selected one-star rating variants because there are no infertility-related variants with three- or four-star ratings (no Variant Curation Expert Panels exists for infertility, precluding any variants from having these ratings). The formation of an infertility Variant Curation Expert Panel for ClinGen (https://www.clinicalgenome.org) is sorely needed to critically address the accuracy of pathogenicity calls currently present in ClinVar.
In summary, we provide in vivo evidence of 8 human variants in 7 essential reproduction genes that proved to be benign in mice, despite being predicted to have negative impacts upon protein function by in silico pathogenicity prediction algorithms or the Y2H assay. We conclude that these variants are likely to be clinically benign, with the caveats that mice may be more tolerant to the protein alteration, and that these alleles may contribute to phenotypic defecst when combined with variations in other genes. We compared 10 predictors and found that REVEL outperformed the others for evaluating infertility-related variants in ClinVar. However, all the 10 predictors performed worse in classifying variants that were functionally tested in mouse models compared to those classified by ClinVar, possible reflecting a circularity in the classification process. The results underscore the importance of animal models in the functional validation of genetic variants involved in human infertility, and the need for more rigorous and evaluations of VUS.
Supplemental information description
Supplemental data include 6 figures and 3 tables.
Declaration of interests
The authors declare no competing interests.
Web resources
Clinvar Miner: https://clinvarminer.genetics.utah.edu
Ensembl VEP: https://useast.ensembl.org/Homo_sapiens/Tools/VEP
Acknowledgments
The authors would like to thank R. Munroe and C. Abratte of Cornell’s transgenic facility for generating the mice, as well as D. Conrad and S. Wierbowski for discussions. This work is supported by grants from the National Institutes of Health (R01 HD082568 and P50 HD096723) and contract CO29155 from the NY State Stem Cell Program (NYSTEM). X.D. was supported by a postdoctoral fellowship from the Empire State Stem Cell Fund through New York State Department of Health contract C30293GG.