RT Journal Article SR Electronic T1 Widespread separation of the polypyrimidine tract from 3’ AG by G tracts in association with alternative exons in metazoa and plants JF bioRxiv FD Cold Spring Harbor Laboratory SP 363804 DO 10.1101/363804 A1 Hai Nguyen A1 Jiuyong Xie YR 2018 UL http://biorxiv.org/content/early/2018/07/06/363804.abstract AB At the end of introns, the polypyrimidine tract (Py) is often close to the 3’ AG in a consensus (Y)20NCAGgt in humans. Interestingly, we have found that they could also be separated by purine-rich elements including G tracts in thousands of human genes. These regulatory elements between the Py and 3’AG (REPA) mainly regulate alternative 3’ splice sites (3’SS) and intron retention. Here we show their widespread distribution and special properties across kingdoms. The purine-rich 3’SS are found in up to about 60% of the introns among more than 1000 species/lineages by whole genome analysis, and up to 18% of these introns contain the REPA G tracts in about 2.4 millions of 3’SS in total. In particular, they are significantly enriched over their 3’SS and genome backgrounds in metazoa and plants, and highly associated with alternative splicing of genes in diverse functional clusters. They are also highly enriched (3-6 folds) in the canonical as well as aberrantly used 3’ splice sites in cancer patients carrying mutations of the branch point factor SF3B1 or the 3’AG binding factor U2AF35. Moreover, the REPA G tract-harbouring 3’SS have significantly reduced occurrences of branch point (BP) motifs between the −24 and −4 positions, in particular absent from the −7 - −5 positions in several model organisms examined. The more distant branch points are associated with increased occurrences of alternative splicing in human and zebrafish. The branch points, REPA G tracts and associated 3’SS motifs appear to have emerged differentially in a phylum- or species-specific way during evolution. Thus, there is widespread separation of the Py and 3’AG by REPA G tracts, likely evolved among different species or branches of life. This special 3’SS arrangement contributes to the generation of diverse transcript or protein isoforms in biological functions or diseases through alternative or aberrant splicing.