Aberrant regulation of RNA splicing in sunflower hybrids may underlie intrinsic incompatibilities

Alternative spicing is an integral part of gene expression in multicellular organisms that allows for diverse mRNA transcripts and proteins to be produced from a single gene. However, most existing analyses have focused on macro-evolution, with only limited research on splice site evolution over shorter term, micro-evolutionary time scales. Here we examine splicing evolution that has occurred during domestication and observe 45 novel splice forms with strongly transgressive isoform compositions, representing 0.24% of analyzed transcripts. We identify loci associated with variation in the levels of these splice forms, finding that many novel transcripts were regulated by multiple alleles with non-additive interactions. A subset of these interactions involved the expression of individual spliceosome components. These overdominant and epistatic interactions often resulted in alteration in the protein-coding regions of the transcripts, resulting in frameshifts and truncations. By associating the splice variation in these genes with size and growth rate measurements, we found that none of the individual splice variants affected these plant traits significantly, but the cumulative expression of all aberrant transcripts did show a significant reduction in growth rate associated with higher proportions of disrupted transcripts. This demonstrates the importance of co-evolution of the different spliceosomal components and their regulators and suggests that these genes may contribute to evolution of reproductive isolation as Bateson-Dobzhansky-Muller incompatibility loci. Author summary In multicellular organisms, it is common that segments of pre-mRNA molecules are physically removed, and the remaining segments are spliced back together. Through splicing alternative combinations of segments together, organisms produce various mRNA molecules, and thus multiple proteins, using the information encoded in a single gene. Here, we investigated the RNA of two sunflower genotypes, one wild and one domesticated, as well as the hybrid offspring resulting from a cross between the two genotypes. We found certain mRNA molecules that were spliced exclusively in the hybrids and were absent in the examined parental lines. These unique hybrid mRNAs were predicted to be consequential for the hybrids’ health, and thus represented a malfunction in the mechanisms that regulate splicing. These results improve our understanding of the genetic regulation of alternative splicing and how alternative splice forms evolve. Our findings may lead to further inquiries about how aberrant splicing promotes the formation of new species in nature.


10
133 Genes with transgressive splicing patterns were distributed across all 17 chromosomes.
134 However, the number of transgressive isoforms spliced per gene was sometimes 135 greater than one: three genes each had two novel isoforms, and a single gene had 136 three novel isoforms. The majority of transgressive genes had similar splicing patterns 137 between wild and domesticated parents, although 25% were differentiated between 138 parent taxa. In Smith et al. (2018), splicing had diverged in 40% of analyzed genes. This 139 indicates that divergence in splicing of a particular gene is not a prerequisite for the 140 production of novel isoforms.  (Table 1). No two transgressive genes had the 145 same Arabidopsis homolog; however, for each gene with multiple transgressive 146 isoforms, the isoforms aligned to different splice forms of the same protein in a cleanly 147 delineated manner. The annotations spanned a diverse array of functions, and no 148 functional category appeared predominant. There were five novel isoforms that did not 149 align well to any Arabidopsis proteins, two of which had a parent isoform with a 150 homolog. The annotations for these were translation initiation factor 3 subunit I and 151 hypothetical protein.
152 Most transgressive isoforms aligned to the same protein splice variant as one of the 153 corresponding parent isoforms. This result is consistent with the transgressive isoforms 154 being previously uncharacterized, although it is reasonable to suspect that this pattern 155 is due to limitations of the alignment algorithm, or a less than comprehensive protein 156 database. Only two novel isoforms aligned to different proteins than the parent isoform.
157 In one of these cases, the transgressive isoform aligned to a different protein splice 158 variant than the parent isoform, but they had the same annotation: uncharacterized 159 protein family UPF0016, a suspected calcium transporter (Demaegd et al., 2014). In the 160 other case, the transgressive isoform aligned to a protein with a different accession than 161 the parent isoform, but only a subtly different functional annotation. The parent isoform 162 was annotated as an F-box/RNI-like superfamily protein, while the transgressive isoform 163 was most similar to the RNI-like superfamily protein.      Table S1. 558 each parent were interacting to cause transgressive isoform splicing. The proportion of 559 transgressive isoform spliced relative to overall gene expression was used as the 560 phenotype value, and if an individual had insufficient expression ( < 1 TPM) at the 561 gene-level they were assigned missing data. We limited the search to pairs of loci on 562 different chromosomes that met the following criteria: (i) at least three individuals were 31 563 represented that had non-missing phenotype data and were homozygous for the Ha89 564 allele at both loci-notated AABB-, (ii) at least three individuals were represented that 565 were homozygous for the Ann1238 allele at both loci-ȦȦḂḂ-, (iii) at least three 566 individuals were represented for at least one mixed genotype -ȦȦBB, AAḂḂ, AȦBB, 567 AABḂ, ȦȦBḂ, AȦḂḂ, or AȦBḂ. Encoding the genotypes as allele dosages-0, 1, or 2-, 568 we tested for an interaction effect between markers using a multiple linear regression.
569 We used a Bonferroni correction to obtain a significance threshold for the epistasis 570 scan: the total number of tests for all 45 phenotypes was 1.61 × 10 9 , and the 571 significance threshold became p < 3.11 × 10 -11 . Last, we applied the following post 572 hoc filter to avoid outlier effects: if the most transgressive genotype group had fewer 573 than three individuals represented, we counted an otherwise significant test as non-574 significant. We allowed a single QTL pair for each pair of chromosomes and assigned 575 the QTL peak as the marker position with the smallest p value. We delineated the QTL 576 region as the range of SNP markers with p-values within 25% of the peak in log-space: 577 p < p min (1 -0.25) .
578 Interacting QTLs were reported from the above scan only. We conducted a similar 579 epistasis scan using R/qtl with standard two-locus interval mapping, the densest genetic = 999). One thousand permutations were used to 582 obtain a null distribution for the interaction LOD score, and the 95 th percentile was used 583 as the significance threshold. This scan found a total of 177 pairs of interacting QTLs for 584 21 novel isoforms; however, all of these QTL-pairs were suspect. For a large proportion 585 of cases, it was clear that outliers had inflated statistical significance because the