Secondary structure of pre-mRNA introns for genes in the 15q11-12 locus. Mapping of functionally significant motives for RNA-binding proteins and nucleosome positioning signals

In this study, we identified reproducible substructures in the folded structures of long intron RNAs for recursive spliced variants and annotated pre-mRNA for GABRB3 and GABRA5. We mapped the RNA motives recognized by RNA-binding proteins for the specified locus and characterized the area of preferred localization. A comparison of pre-mRNA variants revealed the dominant type of protein potential effects. We determined the structural specifics of RNA in the dense Alu cluster and clarified the analogy of apical substructure to the A-Xist fragment of transcriptional variant. Mapping of the nucleosome potential reveals alternation of strong and weak signals at the 3’-end portion of GABRB3 and clusters of nucleosome positioning signal in the vicinity of the Alu cluster. Distribution of simple oligonucleotides among reproducible substructures revealed an enrichment in Py-tracts; for some of them, this may be considered as a complementary supplement to the Pu-tract enrichment of ncRNA Malat1 as a component of nuclear speckles. The secondary structure elements of bidirectional transcripts are predisposed for somatic homolog pairing in this locus, as was previously shown experimentally. A model of potential intron RNA influence on splicing has been suggested based on its interaction with Py-tract-binding RNP, serine-arginine SRSF proteins, ncRNA Malat1, as well as the action of Alu cluster.


49
The splicing model for exons surrounded by long introns is based on the assumption of a

142
The mapping data for sites along the nucleotide sequence of locus 15q11-12 are shown in      First, in accordance with in silico predicted splicing sites (Gene id programme), we 180 subdivided the longest intron of pre-mRNA (149 knt) (Table S2)  RNAs. These fragments may be considered as corresponding to recursive splicing. Fig. 3 depicts 183 long intronic RNA of GABRB3 gene that together with a core-part (Fig. 4) constitutes transcription 184 variants 1,2. The core-part together with small exons/introns 1, 2 and 5'UTR corresponds to variant 185 4. Truncation of long 149-knt intron to 95 knt and joining it with the core-part gives rise to variant 3 alternative splicing of starting exons) are expressed in brain at a foetal developmental stage,

189
whereas variant 3 is largely expressed in adult brain and, to a lesser extent, in cardiomyocytes, lung, 190 testis and in muscles [Proteomics, GenBank]. Variant 4 predominantly expresses in adult brain.

191
Additionally, there is a very long transcript expressed in retina. According to the latest data, the 192 locus transcription is bi-allelic in brain, and in disease, it is partially biased to mono-allelic variants coordinates of structural elements relative to the genomic sequence are given in Table S2 for the 202 hg38 assembly of Homo sapiens genome (GenBank). These folding images are further used as the 203 basis for mapping of RNA-binding proteins motives.

204
Second, the sliding window method was used for non-recursive folding variants, if such 205 exist, to estimate the possibility of identification of folding peculiarities in long intronic RNAs, for 206 example, intron 3 (GenBank, 149 knt) (Fig. 6). This non-recursive folding may be realized at the 207 early interphase when splicing is delayed compared with transcription. For each sliding window 208 used for folding, in the resulting structure we distinguish the branches as clusters of concentrated

307
Altogether, this can lead to an efficient processing of pre-mRNA.

308
According to the proteomics data (GeneBank) [69], the density of SRSF1 protein in brain is at    For an easy description, we highlight fragment f1 as containing short introns, exons and transcription variant of GABRB3 gene (end-to-end across the GABRB3 and GABRA5 genes, active 345 in retina). Quantitatively, the SRSF 1 signal density in f2 and f3+f4+f5' for long introns (S1 Fig. 1, 346 S1) exceeds that for f1 (data for the gene-core), especially, when integrating over the entire length  co-localized with more nonspecific Py-tracts. We mapped these tracts along the 15q11-12 locus. In 367 the long intron 3 (GenBank, GABRB3), they are localized in the central part in one-dimensional representation (Fig. 2T) and in 2*D representation (Fig. 3-5,7, green or dark green spots, green 369 spots for tracts density higher 20 motif/knt, dark green ones for density higher 25 motif/knt), 370 namely, in the strong peak B15 and in the weaker peak B11, as well as in inter-branch spaces 371 (interB14-B15, interB15-B16). For the core-part of GABRB3 gene (Fig. 4), such mapping revealed

384
Green spots dominate the upper part of the picture (Fig. 3), that is, in the centre of the long 385 intron 3 (GenBank). In intron 4 (GenBank), the intensity of Py-rich motives and tracts is higher than

406
For hnRNP G binding sites, there is a preference for CCA repeats [84]. The B6 branch of 407 the long intron 3 (Fig. 3) has sufficiently long repetitions of similar sequence. As hnRNP G and 408 hnRNP L binding sites may have some overlap, they are both labeled by yellow spots (Fig. 3-5 (Fig. 10G-J). In the histogram (Fig. 10M) for the upper chain, the nuclei of 549 dsAlu annealing are Alu 2 (+) and Alu3 (-), and according to the histogram, the interval between them is approximately 1700 nt, whereas the interval between Alu 3 (-) and Alu4 (+) is 551 approximately 400 nt. In the last case, the statistical frequency of annealing is about 2 times higher 552 than between Alu 2 (+) and Alu3 (-). This difference logically follows from the assessment of dsAlu 553 editing rate by enzyme ADAR [94]. It is assumed that this is the case for an average elongation rate 554 for long introns (~ 3 knt/min). Thus, it follows that the nucleation of annealing for the whole cluster  intronic RNAs and, in this regard, our apical structure B1 also has many similarities with 2 stem-582 loop structures as well as with A-Xist structure (compare structures (L) and (K) in Fig. 10LK).

583
These findings are consistent with ideas about the properties of RNA binding site of the PRC2 584 complex.

585
The preference of the whole structure due to its length leads to its ability to be exhibited far 586 into the nuclear space, and undoubtedly, due to many degrees of freedom, facilitates the ability to 587 scan the space and cross the area of nucleus as well as to reach distant portion of the same 588 chromosome.

589
Later in the text, we will show that clusters of significant nucleosome positioning are 590 localized in the downstream area, and this will make the functioning of the complex more efficient

621
GABRA5 intergene region also has many peaks and dips in the NP signal (Fig. 11V).

622
In another situation of II-IX sites the nucleosome clusters may form to be detected when

635
In Fig. 11A, Y, the numbers of reads for each intron are presented from the GenBank data.

636
The introns closer to the 3'-end have a higher level of reads than in the middle and more than in the 637 large intron at the 5'-end. This finding is in accordance with the notion that nucleosomes

663
1T,U). We studied a secondary structure of large intron RNA as part of long bidirectional 664 transcripts by UNAFOLD (Fig. 12). Similarly, to the previous cases, the branches may be 665 considered as multiple stable stem-loops substructures with spatially oblong traits. They have 666 approximately the same coordinates on nucleotide sequence (Table S1) and many incidences of