Splicing impact of deep exonic missense variants in CAPN3 explored systematically by minigene functional assay

Improving the accuracy of variant interpretation during diagnostic sequencing is a major goal for genomic medicine. To explore an often‐overlooked splicing effect of missense variants, we developed the functional assay (“minigene”) for the majority of exons of CAPN3, the gene responsible for limb girdle muscular dystrophy. By systematically screening 21 missense variants distributed along the gene, we found that eight clinically relevant missense variants located at a certain distance from the exon–intron borders (deep exonic missense variants) disrupted normal splicing of CAPN3 exons. Several recent machine learning‐based computational tools failed to predict splicing impact for the majority of these deep exonic missense variants, highlighting the importance of including variants of this type in the training sets during the future algorithm development. Overall, 24 variants in CAPN3 gene were explored, leading to the change in the American College of Medical Genetics and Genomics classification of seven of them when results of the “minigene” functional assay were considered. Our findings reveal previously unknown splicing impact of several clinically important variants in CAPN3 and draw attention to the existence of deep exonic variants with a disruptive effect on gene splicing that could be overlooked by the current approaches in clinical genetics.

sequenced as part of the diagnostic workup of patients with this type of muscular disease (Krahn et al., 2019). More than 400 unique pathogenic or likely pathogenic variants have been reported in the CAPN3 gene so far (359 in Leiden Open Variation Database [LOVD; accessed December 2, 2019] and 555 variants in ClinVar [version November 27, 2019]). However, given the large size of the gene (24 exons, 53 kb), previously unknown variants in CAPN3 are still being identified in diagnostic laboratories. Protein truncating variants (PTV) in CAPN3, such as frameshift-inducing indels, nonsense variants, or variants disrupting canonical splicing sites (±2 nucleotides [nt] from exon-intron junctions), are generally accepted as pathogenic loss-offunction, especially if CALPAIN 3 protein was absent in western blot analysis performed on skeletal muscle biopsy sample. Assigning the clinical significance to a newly identified exonic single-nucleotide variant (SNV, either synonymous or missense) is much more difficult, despite help from American College of Medical Genetics and Genomics (ACMG) guidelines that take into account several lines of pathogenicity evidence (Richards et al., 2015). It is important to note that synonymous variants or missense variants with low impact on protein function can still be pathogenic by disrupting splicing (Duguez et al., 2006;Kergourlay et al., 2014). There is a great risk of misinterpreting the clinical significance of these variants using standard approaches, which could potentially lead to incorrect diagnosis. Unfortunately, it is not currently known if certain exonic regions are more likely to harbor splice-affecting variants. Even though a number of algorithms predicting the splice impact of variants are now available (Rowlands, Baralle, & Ellingford, 2019), often it is difficult to estimate their validity in the absence of functional analysis.
To identify exonic single-nucleotide variants with effect on splicing and to improve the diagnostics of LGMDR1, we developed systematic functional cell-based assay (minigene) for the majority of exons in the CAPN3 gene. We then selected a representative set of missense variants located outside of exon-intron junction sites (deep exonic variants) and tested the effect of these variants on splicing using the developed minigene assay. We observed that 8 out of 21 selected deep exonic missense variants induced abnormal splicing, leading to a change in the classification of clinical significance for half of them. Interestingly, the majority of the deep exonic variants impacting splicing were not identified by several splice-prediction algorithms tested, highlighting the critical need for robust methods of functional analysis of putative variants disrupting splicing. Thus, in addition to the direct benefit for diagnostics of LGMD2R1 patients, our study draws attention to the existence of deep exonic variants with a disruptive effect on gene splicing that could be overlooked by the current approaches in clinical genetics.
2 | MATERIALS AND METHODS 2.1 | Functional splicing assay for CAPN3 gene Minigene reporter assay (Gaildrat et al., 2010) was initially developed for 18 CAPN3 exons. As this assay is not adapted for U12 type introns, the Intron Annotation and Orthology Database (IAOD; Gault et al., 2017;Turunen, Niemelä, Verma, & Frilander, 2013) was used to identify this type of introns, leading to the exclusion of exons 19 and 20 from the analysis. As described previously (Kergourlay et al., 2014;Puppo et al., 2015), exons and approximately 150 bp of flanking introns were amplified using the Expand High Fidelity PCR System (Roche, Basel, Switzerland). Amplicons were subsequently cloned into the pCAS2 vector. Polymerase chain reaction (PCR) and digestion product purifications were performed using the NucleoSpin Gel and PCR Clean-up (Macherey-Nagel, Dürel, Germany), followed by ligation using the Quick Ligation Kit (New England Biolabs, Ipswich, MA), and transformation in 10-beta Electrocompetent E. coli (New England Biolabs). Variants were introduced into the nonmutated exon constructs using the QuickChange II XL Site-directed Mutagenesis Kit (Agilent, Santa Clara, CA).

| Transcriptional study
RNAs were isolated 48 hr after cell transfection, using Trizol/ chloroform and DNA-free Removal Kit (Invitrogen, Carlsbad, CA).
RNAs were reverse-transcribed into complementary DNA (cDNA) and amplified using SuperScript™ III One-Step RT-PCR System with Platinum™ Taq DNA Polymerase (Invitrogen). PCR amplifications were performed using specific primers located in exon A and exon B of pCAS2 (Kergourlay et al., 2014). PCR products were separated by electrophoresis in a 2% agarose gel stained with 0.5 μg/ml of ethidium bromide. A purification step from the agarose gel was performed when several transcripts were present to subclone each of them into pGEM®-T Easy Vector System I (Promega, Madison, WI) and analyze them separately. Finally, sequencing of each transcript was performed using the Big DyeR CAPN3 missense variants from LOVD were assigned the "Clinical Significance" values based on the classification system used in Clin-Var (uncertain/conflicting, likely pathogenic, pathogenic/likely pathogenic, or pathogenic). We combined missense CAPN3 variants from LOVD and ClinVar annotated as nonbenign in both databases, producing a set of 403 variants. Of these, 381 variants were located on more than one or two nucleotides away from exon-intron junction ( Figure 1b). The summary of the variant numbers from these sets is shown in Figure S1. Three novel CAPN3 variants from LGMDR1 cohort at the Department of Medical Genetics at the Timone Hospital (Marseille, France), as well as three previously reported variants, were deposited in the LOVD database (Fokkema et al., 2011) as described in Table S1. As minigene assay was developed for 16 exons of CAPN3, we selected 1-3 variants in each exon to demonstrate the performance of the assay. In total, 24 clinically relevant variants in the CAPN3 gene were selected from the pool of variants shown in Figure S1 to be analyzed by minigene assay in our study (Table S1).
We then specifically focused on 21 missense variants located outside of the canonical splice sites (±2 nt from exon junctions). These variants were not previously known to disrupt splicing, but were predicted to have an effect on splicing according to the Human Splice  Table S1.
Four hundred and forty-seven CAPN3 missense variants found in the general population were downloaded from gnomADv.2.1.1 (Karczewski et al., 2019) using UCSC table browser (Karolchik et al., 2004). Of them, 428 missense variants (i.e., exonic nonsynonymous nonprotein truncating single-nucleotide substitutions) were located outside of the canonical splice sites (±2 nt from exon junctions; Figure 1c).  (Stephenson, Laskowski, Nightingale, Hurles, & Thornton, 2019). SpliceAI scores for the variants were obtained using F I G U R E 1 Deep exonic missense variants in CAPN3 gene. (a) Twenty-one clinically relevant missense variants tested by minigene assay are shown (variants with effect on splicing are in red, variants without impact are in gray). Exon-intron junctions of the CAPN3 gene as well as the functional domains of CALPAIN 3 are visualized. The CALPAIN 3 domain nomenclature is shown according to Ono et al. (2016) and Ye et al. (2018). (b) Three hundred and eighty-one clinically relevant CAPN3 variants (pathogenic, likely pathogenic and uncertain/conflicting) from LOVD and ClinVar databases are visualized along the CAPN3 gene with SpliceAI scores on the y-axis. (c) Four hundred and twenty-eight deep exonic missense variants from gnomAD database (v2.1.1) are visualized along the CAPN3 gene with SpliceAI scores on the y-axis. Variants from adjacent exons are colored differently to visualize the exon junctions. Only variants outside of ±2 nucleotides from exon borders are shown. Vertical grids in panels b and c correspond to exon boundaries. LOVD, Leiden Open Variation Database Alteration of an exonic ESE site. Potential alteration of splicing.
Note: The genomic coordinates (hg19) as well as cDNA and protein position are listed for each variant (columns "Chr," "Start," "Ref," "Alt," "cDNA," "Protein"). The classification of variants according to the modified ACMG criteria (see Section 2 for details) are shown before taking into account the results of the minigene assay (ACMG initial) and after (ACMG post minigene). The standard terms of classification from Richards et al. (2015) are used (i.e., "Pathogenic," "Likely pathogenic," "Variant of uncertain significance [VUS]"). The detailed evidence used to classify each variant is shown in Table S1.  Table S1. Presence of an abnormally spliced transcript associated with the decrease of the normal transcript was considered as "Impact on splicing." If an abnormally spliced transcript was present, but no decrease of the normal transcript was observed for the mutated construct compared to control, the minigene assay conclusion was "Mild impact on splicing." Predictions from the following in silico prediction tools are shown: HSF, MaxEntScan, MutPred Splice, SpliceAI, CADD, REVEL. For "MaxEntScan," "MutPred Splice," and "SpliceAI "columns, the scores predicting an impact on splicing are shown in bold.  (Yeo & Burge, 2004) were obtained using http://www. umd.be/HSF3 v3.1 (Desmet et al., 2009). Variants with MaxEntScan threshold score of 3 or a score difference of more than 30% with the wild-type score were considered as having a "predicted impact on splicing." For 13 out of 24 variants tested, the output of MaxEntScan was "No result found with this matrix." The pathogenicity of 24 functionally tested variants was scored before and after the minigene results according to the ACMG criteria (Richards et al., 2015) with the following modifications.
PP4_strong score was assigned if CALPAIN 3 protein has been previously reported to be absent on the skeletal muscle biopsy

| 1803
The concise information about 21 deep exonic missense variants is shown in Table 1, while extended evidence used for pathogenicity classification of all 24 variants tested by minigene assay in this study is shown in Table S1.

| Localization of deep exonic variant with effect on splicing within exons
To explore the localization of splice-affecting exonic missense variants, we calculated distances from these variants to 5′ and 3′ ends of the exons. These variants were located in seven different exons (4, 5, 8, 9, 11, 13, and 22). For exons larger than 100 nt, we plotted the splice-affecting exonic variants based on their distance from either 5′ or from 3′ exon ends (Figure 4). Thus, the majority of the variants with abnormal splicing in minigene assay were not predicted as splice-affecting by SpliceAI and MutPred Splice (6/8 and 5/8 false negatives, respectively). One variant (c.593A>G, p.(Asn198Ser) in exon 4) was predicted as splice disrupting by both algorithms, yet showed no effect on splicing in minigene assay. All eight variants with abnormal splicing in the minigene assay had MMSplice scores below threshold. We also evaluated predicted impact on protein function, using REVEL and CADD scores (Ioannidis et al., 2016;Rentzsch et al., 2019; Table 1).
F I G U R E 3 Missense CAPN3 variants with splicing defects in the minigene assay (other types of splicing abnormalities). RT-PCR gel electrophoresis results from minigene reporter assay, with each band analyzed using Sanger sequencing. . This abnormally spliced transcript is absent from the control lane with the nonmutated construct. The higher band (indicated with an asterisk, *) corresponds to the normal transcript containing exon 13. The normal transcript seems to be present at similar levels for mutated as well as for control constructs, leading to the conclusion of "Mild impact on splicing." A third band (indicated with §) corresponds to the exon 13 missing the last 87 nt (no frameshift). This band is present at high levels in the control and at lower levels in the assay with the mutated construct. (c) Minigene assay for the variant c.2306G>A, p. (Arg769Gln) showing splicing patterns different from that of the control construct. The lower band (indicated with a circle, o) corresponds to the transcript with skipped exon 22 (same as empty minigene). This abnormally spliced transcript is absent from the control lane with the nonmutated construct. The higher band (indicated with an asterisk, *) corresponds to the normal transcript containing exon 22. The normal transcript seems to be present at similar levels for mutated as well as for control constructs, leading to the conclusion of "Mild impact on splicing." A third band (indicated with §) corresponds to the transcript containing only the first 43 nt of exon 22. This band, corresponding either to a minigene assay artifact or to a minor noncanonical natural splice form, is present in the control and absent in the assay with the mutated construct. RT-PCR, reverse transcription-polymerase chain reaction F I G U R E 4 Distance from exonic borders of deep exonic variants with splice impact in minigene assay. Eight missense variants with impact on splicing (red) are visualized along the exons of CAPN3 gene. Distances from both 5′ and 3′ ends are shown for smaller exons (<100 bp) F I G U R E 5 Performance of algorithms predicting impact of splicing for deep exonic variants. Results of three splice prediction computational tools are shown for 21 CAPN3 missense variants tested by minigene assay. The variants are separated into three groups depending on the minigene result: "Impact," "Mild impact," or 'No impact" on splicing. The vertical axis represents the predicted impact on splicing. The score cutoff above which a variant is predicted to affect splicing is 0.2 for SpliceAI and 0.7 for MutPred Splice. The prediction score values are also listed in Table 1. VUS, variants of unknown significance All eight missense variants with impact on splicing were also predicted to be deleterious at the protein level. Indeed, five variants (p.Thr184Met and p.Thr184Arg (exon 4), p.Met248Arg (exon 5), p.Arg355Gly (exon 8), and p.Glu396Gly (exon 9)) affected conserved residues of the Calpain-type cysteine Protease conserved (CysPc) domain. Two variants (p.Arg489Gln (exon 11) and p.Arg541Gln (exon 13)) changed conserved amino acids located in the calpain-type βsandwich domain (CBSW)), while p.Arg769Gln (exon 22) affected a conserved amino acid that is expected to participate in protein-protein interactions in the penta-EF-hand domain (PEF).
3.4 | Pathogenic classification for 7 out of 24 of variants was modified based on the results of the minigene assay As most of the variants selected for testing by minigene were identified before the ACMG recommendations for assessing variant pathogenicity, we classified each variant according to the ACMG criteria modified for the CAPN3 gene as described in Section 2. The details of evidence used to classify each variant are available in Table   S1. The results of the minigene assay, allowing to assign additional PS3, were then taken into account to calculate "post minigene ACMG classification" for each variant. Several studies have previously drawn attention to the possibility of exonic variants disrupting normal splicing (Savisaar & Hurst, 2017).
For example, missense variants in DYSF and BRCA genes have been shown to be pathogenic through their impact on splicing (Kergourlay et al., 2014;Théry et al., 2011). Recent studies have shown significant variability in mRNA splicing patterns between individuals, as well as between human populations in general (Lu, Jiang, & Xing, 2012;Park, Pan, Zhang, Lin, & Xing, 2018). Thus, one could imagine that the frequency of a con- variant has long been considered as pathogenic, responsible for milder forms of LGMDR1 (Anderson et al., 1998;Barp et al., 2020;Fanin et al., 2004;Richard et al., 1999). This variant had a very low frequency in the initially tested subpopulations (0.001% in Europeans, 2 in 129164 alleles in gnomADv2.1.1). However, with more sequencing data now available from different populations, it turns out this variant has 3.3% frequency in African and 0.09% in Latino populations, leading to its reclassification as Benign or Likely benign by many diagnostic laboratories (see Table S1 for more details Six variants with impact on splicing in our minigene assay induce skipping of an exon that would lead to frameshifting downstream of this exon, predicted to induce nonsense-mediated decay (NMD).
These variants are therefore expected to lead to a decrease of the CALPAIN 3 protein in the muscle cell. Indeed, protein decrease on western blot analysis has been reported in eight patients carrying variants with splice impact on minigene assay (see Table S1  However, these prediction tools were designed and tested using the already known splice variants, most of which are intronic or located at canonical splice sites. In this study, we focused on potential splicedisrupting missense variants located deeper in exons, a group of variants that have been relatively little explored (Savisaar & Hurst, 2017). We found the performance of the tools tested suboptimal, as they failed to predict the splicing effect for more than half of the variants showing impact on splicing in the minigene assay. Our results are consistent with a previous study of BRCA1 and BRCA2 in which exonic variants with impact on splicing in a minigene assay were not predicted splice-disrupting by five different algorithms (Théry et al., 2011). Taken together, these findings suggest that exonic variants with effect on splicing might be more difficult to predict using currently available algorithms. The splice-disrupting exonic variants identified in this study could contribute to the new data set of this type of variants to develop more efficient splicepredicting tools.
Exonic elements regulating splicing consist of exonic splice enhancers (ESEs) that promote the splicing of the exon and exonic splice silencers (ESSs) that induce exon exclusion from the mature mRNA (Ohno, Takeda, & Masuda, 2018). The density of these elements tends to rise toward the exon-intron junctions (Fairbrother, Holste, Burge, & Sharp, 2004;Woolfe, Mullikin, & Elnitski, 2010), thus increasing the probability that a variant present in these regions affects splicing. Interestingly, seven out of eight splice-disrupting missense variants identified in this study were located more than 30 nt away from the exon border. The distribution of regulatory splicing information in the deep exonic regions is less clear. Two models have been previously proposed for functional splice elements in exonic regions-rare regions under strong purifying selection versus multiple lowly constrained regions (Savisaar & Hurst, 2017).
Our results are more consistent with the second model, as none of the exonic variants with impact on splicing described in our study are located in regions of high constraint (CCR>90; Havrilla, Pedersen, Layer, & Quinlan, 2019). Indeed, only a small region of exon 5 of CAPN3 is constrained using this method, most likely due to the fact that the vast majority of CAPN3 pathogenic variants are responsible for recessive disease.
Our results are consistent with these findings (8 splice-affecting variants out of 21 tested by minigene), even though all 21 variants tested were predicted to be splice-modifying by HSF, thus overestimating the proportion of splice-affecting exonic variants. It is important to note that the minigene assay remains an in vitro approach that should ideally be confirmed by studies using patientderived material, such as RT-PCR or RNAseq. The advantage of in vitro studies is that the impact of different variants can be compared without the need to consider the genetic background of the assay.
Indeed, recent guidelines on assigning functional evidence scores (PS3/BS3) in the ACMG classification framework treat these two lines of evidence differently and recommend using PP4 score for patient-derived data, while assigning PS3/BS3 for evidence obtained from validated in vitro functional assays (Brnich et al., 2019).
Accessibility and wide use of high-throughput sequencing for limb girdle muscular dystrophy diagnosis has led to identification of multiple variants of uncertain pathogenicity, increasing the demand for efficient functional studies. For LGMDR1, presence of CALPAIN 3 protein on western blot analysis or screening for abnormal transcripts in skeletal muscle RNA has proven to be useful to achieve accurate molecular diagnosis. However, both of these approaches require access to patient's skeletal muscle biopsy material. A useful in vitro alternative is the well-established minigene assay. It has been previously shown that the results of minigene assays are concordant with the analysis of patient RNA (Bonnet et al., 2008;Théry et al., 2011;Tournier et al., 2008). Here, we developed the minigene approach for the majority of CAPN3 exons and used it to test 24 variants. The results of the minigene assay allowed assigning additional evidence points according to the ACMG recommendations (Richards et al., 2015) leading to change in pathogenicity classification for seven variants. These results demonstrate the utility of the minigene functional assay for diagnostics of calpainopathy and reinforce the need to incorporate functional studies into diagnostic process on a larger scale.
The functional assay described here can be directly used by the diagnostic laboratories to test CAPN3 variants with suspected splicing effect, allowing to establish long-awaited diagnosis for many LGMDR1 patients.

ACKNOWLEDGMENTS
We thank Alexandra Martins and Christiane Duponchel for generously providing the pCAS2 vector. This study was supported by GIPTIS (Genetics Institute for Patients, Therapies Innovation & Science). This study was funded by AFM (L'Association française contre les myopathies). AS received a fellowship from FRM (Fondation recherche medicale), project number ECO20170637467.

CONFLICT OF INTERESTS
The authors declare that there are no conflict of interests.

AUTHOR CONTRIBUTIONS
E. D. and A. D. had major roles in the acquisition and analysis of data, as well as in writing of the manuscript; N. D. had a major role in data acquisition; A. S. participated in data interpretation and writing of the manuscript; N. L. and M. K. revised the manuscript for content; M. B.,