Sequence and phylogenetic analysis revealed structurally conserved domains and motifs in lincRNA-p21

Long Intergenic Non-coding RNAs (lincRNAs) are the largest class of long non-coding RNAs in the eukaryotes, which originate from the intergenic regions of the genome. A ~4kb long lincRNA-p21 is derived from a transcription unit next to the p21/Cdkn1a gene locus. LincRNA-p21 plays key regulatory roles in p53 dependent transcriptional repression and translational repression through its physical association with proteins such as hnRNP-K and HuR.It is also involved in the aberrant gene expression in different cancers. However, detailed information on its structure, recognition, and trans-regulation by proteins is not well known. In this study, we have carried out a complete gene analysis and annotation of lincRNA-p21. This analysis showed that lincRNA-p21 is highly conserved in primates, and its conservation drops significantly in lower organisms. Furthermore, our analysis has revealed two structurally conserved domains in the 5’ and 3’ terminal regions of lincRNA-p21. Phylogenetic analysis has revealed discrete evolutionary dynamics in these conserved domains for orthologous sequences of lincRNA-p21, which have evolved slowly across primates compared to other mammals. Using Infernal based covariance analysis, we have computed the secondary structures of these domains. The secondary structures were further validated by energy minimization criteria for individual orthologous sequences as well as the full-length human lincRNA-p21. In summary, this analysis has led to the identification of sequence and structural motifs in the conserved fragments, indicating the functional importance for these regions.

The functional diversities of lincRNAs arise from their ability to adopt different structures and molecular interactions with not only proteins but also with other RNAs and DNA (22,23).
LncRNAs are rapidly evolving, and the selection acting on structure rather than primary sequence has been proposed to explain their rapid evolution. This led to the "RNA modular code" hypothesis based on the view that evolutionary selection acts on structural domains in RNA (23)(24)(25). Some experimental evidence supports this concept. For example, the maternally expressed gene 3 (MEG3) lncRNA gene contains three distinct structure modules: M1, M2, and M3. Deletion analysis showed that motifs M2 and M3 are important for p53 activation.
Intriguingly, a hybrid MEG3 transcript in which half of the primary sequence in the M2 motif was replaced by an entirely unrelated artificial sequence that displayed a similar secondary structure was fully functional in stimulating p53-mediated transcription (26). Similarly, for lincRNAs structural conservation rather than nucleotide sequence conservation seems to be crucial for maintaining their function (27). Several lincRNAs acquire complex secondary and tertiary structures, and their functions often impose only subtle sequence constraints.
Here, we have focused on the analysis of sequence, structure, and evolutionary features of lincRNA-p21 that is a transcriptional target of p53 and HIF1-a (28)(29)(30)(31). LincRNA-p21 is a ~4kb long lncRNA derived from a transcription unit next to the p21/Cdkn1A gene locus (hence named lincRNA-p21) (6,32). The 'guardian of the genome' tumor suppressor p53 plays a key role in maintaining genomic integrity (33). Upon DNA damage, p53 triggers a transcriptional response resulting in either cell cycle arrest or apoptosis (34). While p53 is known to transcriptionally activate numerous genes directly, the mechanism by which p53 causes gene repression involves its interaction with other factors. For example, several lincRNAs that are physically linked with repressive chromatin-modifying complexes have been shown to act as repressors in p53 transcriptional regulatory networks (5). LincRNA-p21 has been identified as one among such p53-activated lincRNAs (29). p53 binds in the highly conserved canonical p53-binding motifs containing promoter regions of lincRNA-p21, thereby driving its expression (35,36). LincRNA-p21, in turn, functions as a downstream transcriptional repressor of several genes hence plays an important role in the p53-dependent induction of cell death in response to DNA damage. In recent times, numbers of reports have shown misregulation and differential expression of lincRNA-p21 in a number of cancers, including prostate, colorectal, chronic lymphatic leukemia, and atherosclerosis (37)(38)(39)(40)(41).
LincRNA-p21 mediates the transcriptional repression through specific association with heterogeneous nuclear ribonucleoprotein K (hnRNP-K). hnRNP-K was previously identified as a component of the repressor complex that acts in the p53 pathway (42,43). It has the ability to bind both ssDNA and ssRNA via its three KH (hnRNP-K homology) domains (44).
Furthermore, lincRNA-p21 was shown to bind hnRNP-K through a conserved 780 nucleotides long 5′ region. Interaction of hnRNP-K with lincRNA-p21 was shown to be required for its proper localization and subsequent induction of apoptosis via transcriptional repression of p53regulated genes (29). Overall, these studies implicated lincRNA-p21 -hnRNP-K association to mediate the p53 dependent transcriptional repression of several genes. LincRNA-p21 has been shown to be transported to the cytoplasm, where it is also involved in repressing the translation of several mRNAs (for example, JUNB, β-catenin, and potentially many more that encode proteins involved in cell proliferation and survival). LincRNA-p21 is hypothesized to directly base-pair with mRNAs and in conjunction with known translational repressor RCK/p54 (the evolutionarily conserved ATP-dependent DEAD-BOX helicase) remodel mRNPs and influence their association with elongation-factor eIF4E. The levels of lincRNA-p21 in the cytoplasm are regulated by RRM domains containing the protein HuR/ELAVL1 (7). The association of HuR with lincRNA-p21 favored the recruitment of let-7/Ago2 to lincRNA-p21, leading to the lower lincRNA-p21 stability. This leads to relieving of translation repression of lincRNA-p21 targeted mRNAs (7,45).
In general, lncRNAs show poor sequence conservation across species showing only scattered conserved regions surrounded by large seemingly unconstrained sequences (27). Therefore, phylogenetic analysis of lncRNAs is important and can reveal the conserved and functionally important regions in RNA. A functional domain in lncRNA is likely to be conserved in animals and adopt a structure that is possibly shared by the orthologues sequences. This pattern of conservation allows computational genome analysis that can be used to identify lncRNAs from different organisms as well as define their functional domains. This approach had been adopted to understand the origin and evolution of lincRNAs such as XIST and HOTAIR (46)(47)(48). In this study, using computational and bioinformatic methods, we have investigated lincRNA-p21 that function in both cis and trans to understand its sequence, structure, and evolution (28,29). Particularly, we addressed the following questions.

Data
The sequence of human lincRNA-p21 long isoform (accession number KU881768.1) and mouse lincRNA-p21 (accession number NC_000083.6) were acquired from the National Acquiring sequences orthologous to mouse lincRNA-p21 exons through genome search Human and mouse lincRNA-p21 sequences were aligned in Emboss (53). The human lincRNA-p21 sequence ranging from 1 to 198 nucleotides and 199 to 3898 nucleotides were found to align with mouse lincRNA-p21 exon 1 and exon 2 sequences, respectively ( Figure 1).
Accordingly, for the purpose of our study, the human lincRNA-p21 sequence was divided into two segments: 1 -198 and 199 -3898 nucleotides sequence spans. These two consecutive sequence ranges in human lincRNA-p21 were considered as mouse equivalent of human exon 1 and exon 2, respectively, and termed as segment 1 and segment 2 for human lincRNA-p21 ( Figure 1). Each of these two segments was separately used as a query to search the genomes of the chimpanzee, rhesus monkey, gorilla, cow, horse, dolphin, cat, mouse, rat, platypus, chicken, and zebrafish in Ensembl using BLASTN (54). For each query, the best hit obtained in a particular genome was considered as the sequence orthologous of human lincRNA-p21 for the corresponding species. Likewise, the obtained sequence orthologous in humans, rhesus monkey, and cat were aligned using LocARNA (55). Based on these alignment results, two queries (query 1 and query 2) were built using cmbuild and cmcalibrate functions of Infernal

Structure prediction through sequence alignment and using a thermodynamic method
Sequences that correspond to the highest-scoring Infernal hits of the ten mammals were considered as orthologous to the two lincRNA-p21 segments and named as domain A and domain B. They were aligned using cmalign function of Infernal for phylogenetic analysis.
Structures were predicted for the orthologous of these two domains using PMmulti (51) and Mfold (52). PMmulti performs pairwise and multiple progressive alignments of RNA sequences, and Mfold predicts the secondary structure of RNA and DNA, mainly by using thermodynamic methods. Predicted structures were displayed using either Mfold (52) or PseudoViewer (56).
In all cases, default parameters were used.

Phylogenetic study
Using orthologous sequences of lincRNA-p21 domain A and domain B in the 10 mammals, neighboring Cdkn1a gene was analyzed in 10 mammals using EvoNC (50). For this purpose, the rate of substitution in noncoding regions relative to the rate of synonymous substitution in coding regions is estimated by a parameter ζ. ζ is the nucleotide substitution rate in the noncoding region, normalized by the synonymous nucleotide substitution rate in the coding region.
Therefore, when a site is subject to neutral selection ζ = 1. Similarly, ζ > 1 indicates positive selection, while ζ < 1 suggests the presence of negative selection. Therefore, the interpretation of ζ is similar to the interpretation of the rate of nonsynonymous/synonymous substitution (ω) in models of evolution in coding regions.

Results and Discussion
The sequences of lincRNA-p21 orthologs show poor conservation among vertebrates LincRNA-p21 is located between Srsf3 and Cdkn1a protein-coding genes on chromosome 6 in human. In mouse, the mature lincRNA-p21 is found on chromosome 17, consists of two exons: exon 1 and exon 2 while, human lincRNA-p21 contains only a single exon on chromosome 6 that aligns with both exon 1 and exon 2 of mouse lincRNA-p21 ( Figure 1). We searched several vertebrate genomes in the UCSC genome browser for matches to lincRNA-p21. The whole sequence of human lincRNA-p21 showed apparent conservation among mammalian orthologs (Figure 2A and 2B). Individually, when mouse exon 1 and exon 2 equivalent human lincRNA-p21 sequences, (defined as segment 1 and segment 2 respectively) were searched using BLAT and BLASTN (62) tool across different vertebrate genomes in Ensembl databases, they returned several hits with high to low scores. We assumed that if a hit is found in between Srsf3 and Cdkn1a genes with an E-value cut off 1e-05, it can be considered as orthogous to human lincRNA-p21 sequence. Notably, for both the queries, the hits produced with highest score (least E-value) were always located between the two proteincoding genes for all the different mammalian genome studied here. This result suggests that lincRNA-p21 has orthologs in mammals. The top scoring hit from each genome are listed in Table 1. The hits found in primates were high scoring but hits from all other non-primate mammals had comparatively poor scores. In other words, close matches were observed only in primates and not in other mammals. Eventually, for non-mammalian vertebrates, like platypus, chicken and zebrafish no hits were found between the two protein coding genes. This finding implies that if lincRNA-p21 also has orthologs in non-mammalian vertebrates, they may have moderate to low sequence conservation to be identified or revealed solely by sequence search method. The compensatory mutations commonly found in ncRNAs that changes the primary sequence but maintains the secondary structures can be one of the reasons behind poor sequence conservation (63, 64).

LincRNA-p21 exists in mammal and show high conservation among primates
LncRNAs are characterized not only by varying sequences but also conserved structures.
Therefore, we extended our investigation through structural search in different vertebrate genomes to confirm the existence of lincRNA-p21 only in mammals. Infernal was used to search the whole genomes for matches to lincRNA-p21. Infernal is a local RNA alignment and search program that uses the combination of sequence consensus and secondary structure conservation in RNA to generate a covariance model structure (49). To construct the query for building covariance model of lincRNA-p21 (necessarily a representative structure), we used human lincRNA-p21 segment 1 and segment 2 corresponding sequences and their identified orthologous sequences in rhesus monkey (BLASTN score: 336, E-value: 3e-90 and BLASTN score: 5174, E-value: 0, for segment 1 and 2, respectively), a primate that is distantly related to human and cat (BLASTN score: 60, E-value: 4e-07 and BLASTN score: 174, E-value: 3e-40 for segment 1 and 2, respectively) which is a non-primate mammal. Two queries (query 1 and query 2) were built with these sequences using the cmbuild and cmcalibrate functions of Infernal tool (49). These two queries were used to search the complete genome of ten placental mammals (human, chimpanzee, rhesus monkey, gorilla, cow, horse, cat, dolphin, mouse, and rat), the ancestral mammal platypus, and two other vertebrates (non-mammals) chicken and zebrafish ( Figure 3). Orthologous sequences of the lincRNA-p21 segments (hits located between Srsf3 and Cdkn1a with high scores) were obtained through Infernal search in all of the placental mammals but not in platypus or the other non-mammalian vertebrates ( Figure 3).
Notably, each query produced just one high scoring hit in the mammalian genomes (  Figure 2B). Overall, query 1 did not produce any high-scoring hits in rodents. Similarly, query 2 did not result in any high-scoring hits in ungulates like horse or dolphin. Moreover, both the queries produced good matches in primates but poor matches in the remaining mammals ( Figure 3 and Table 2).
However, the top hits from non-mammalian vertebrates (platypus, chicken, zebra fish) had much less scores compared to the mammalian species (Table 2). For example, in chicken, the highest scoring hit had Infernal score as less as 25.2 (E-value 0.0666) and 25.9 (E-value 2.0) for query 1 and query 2, respectively. Moreover, the hits were not found between Srsf3 and Cdkn1a genes. Thus, combination of sequence and structural search confirmed that lincRNA-p21 exists only in mammals and it has evolved further to become highly conserved in primates.
We hypothesize that lincRNA-p21 may have conserved structural domains but divergent sequences in different vertebrates. This is a general feature for lncRNAs (65). For example, Xist and HOTAIR both contain fast evolving sequences as well as highly conserved structures (48,66). The reasons that constrain lncRNAs evolution are not clear. Understanding of evolutionary constraint of lncRNA like lincRNA-p21 that functions both in cis and trans to control local and global gene expressions will be more intriguing.

Two conserved domains identified in the 5' and 3' terminal regions were found unique to lincRNA-p21
Besides one high-scoring hit located between Srsf3 and Cdkn1a, several scattered low-scoring hits of queries (query 1 and 2) were widely obtained in mammalian as well as other vertebrate genomes. In different genomes, number of non-specific hits varied across human, chimpanzee, gorilla, rhesus monkey, cow, horse, dolphin, rat, mouse, platypus, chicken and zebra fish for both query 1 and query 2. The 3707 nucleotides long query 2 produced more numbers of lowscoring hits compared to 198 nucleotides long query 1 that received far less numbers of hits.
However, whether these hits have any functional roles is not clear. These low-scoring hits are expected to be insignificant and random. Since along with lincRNA-p21, a few other lncRNAs were also reported to interact with polycomb proteins, Infernal search might have detected some of those consensus sequences for the protein binding shared by different lncRNAs.
Considering that the best hit was less conserved in non-primate mammals and much shorter in all other genomes (Figure 3), we inferred that for query 2 functionally conserved domain(s) in mammals must be much shorter than 3707 nucleotides. However, as the best hits for query 1 spanned the entire length for all the mammals (Figure 3), we assume that entire region of 198 nucleotides long segment 1 may have a structurally conserved function. The data in Figure   3 show that the best structural hits were found as shorter fragments one at the 5`-terminal

Phylogenetic distribution of orthologous sequences of the conserved domains of lincRNA-p21
Protein coding genes commonly originate by gene duplication followed by neofunctionalisation (67)(68)(69) and/or subfunctionalisation (70,71). However, the process and dynamics of evolution in non-coding RNAs is not well understood. We decided to analyze the molecular evolution of the lincRNA-p21 in detail. Using the sequences orthologous to domain A and the sequences orthologous to domain B, two phylogenetic trees were built using Phylip (Figure 4 and also please see Methods section). We assumed that nucleotide substitutions followed the HKY85 nucleotide substitution model (58) (72)) value >1 (Table 3), most sites in these regions should undergo moderately high substitution rates, although a few sites might have slow rates of substitution.
To investigate lincRNA-p21 evolution in further detail, we explored whether nucleotide substitution rates varied among clades. A log-likelihood ratio test was performed to determine whether the HKY85 model would fit the data better with or without a global clock.  (Table 3) estimated for primates (r 1 ) were low compared to non-primate mammals (r 2 and r 3 ). Table 3 shows the rate of evolution, r 2 in ungulates and carnivores (4.88) is more than twice of r 1 in primates (2.26) for domain A. Similarly, for domain B, r 2 in ungulates and carnivores (8.0) is more than 3 times than r 1 in primates (2.15). Similar trends were observed for both Domain A and domain B while computing rate of evolution, r 1 in primates vs. r 3 that includes rodent for local clock estimation. Therefore, these varied rates of nucleotide substitution among clades suggested lincRNA-p21 domains followed discrete evolutionary dynamics in mammals with slow evolution rate in primates. Interestingly, a 5' segment of lincRNA-p21 has been reported to bind to the hnRNP-K protein (29), thus whether the slow rate of evolution of domain A in primates indicate any relationship with its protein binding functions needs further investigations.

LincRNA-p21 is rapidly evolving compared to its nearby protein coding Cdkn1a gene
Cdkn1A and Srsf3 are the two neighboring protein coding genes of lincRNA-p21. Because Srsf3 is absent in gorilla and cat, we compared the evolution of the exon of Cdkn1a gene along with the evolution of the domain A of lincRNA-p21 in ten mammals (see Methods section for details). The Cdkn1a genes exist commonly in all vertebrates, unlike the lincRNA-p21 gene exists only in mammals. Therefore, we were interested to know whether lincRNA-p21 has evolved faster or slower than the neighboring Cdkn1a genes. EvoNC, a program for detecting selection in noncoding regions of nucleotide sequences, was used for this purpose (50). For protein coding sequences, the rate of nonsynonymous/synonymous substitution is used to detect the selection pressure and directionality of selection (i.e. positive or negative selection).
Similar detection in noncoding sequences can be employed by calculating the rate of substitution relative to the rate of synonymous substitution in coding sequences using a parameter named ζ (please see Methods section for further details). A ζ value of 1 indicates that a site in a noncoding sequence evolved neutrally, whereas ζ > 1 and ζ <1 suggest positive and negative selection respectively (50). We concatenated the aligned domains A of lincRNA-p21 with the region of Cdkn1a exon for each ten mammals and analyzed the resulting sequences using EvoNC. A similar approach was undertaken to understand the evolution of lncRNA HOTAIR with respect to a neighboring protein coding gene (48). The program implemented three models, namely the neutral model, a two-category model, and a threecategory model. For each model ζ was estimated as ζ0, ζ1 and ζ2. The results are shown in Table 4. The value of 4.94 found for ζ2 in the three-category case strongly suggested that the lincRNA-p21 region was under positive selection and evolved faster than Cdkn1a. Commonly, a gene with important biological function evolves slowly to maintain the role associated with it.
However the exception can sometime be found when the gene is recent and still evolving (73), which is likely true for lincRNA-p21 gene. Nevertheless, the real factors that drive this positive selection are yet to be determined. Notably, in lncRNAs selection acts on structure rather than primary sequence that may also explain the rapid rate of evolution (23,25,74).

Structure prediction revealed two domains in lincRNA-p21 with invariable sequences and structures in mammals
LincRNA-p21 has been reported to interact with several proteins to exert its functions in cell.
Therefore, it is important to identify the structure of functionally important domains in its sequence (28,29).  Figure   5A and Figure 6A, was configured through Infernal by aligning the identified orthologous sequences from different genomes that were previously obtained using Infernal's structurebased genome searches (49). Table 5 shows the bit score and the average posterior probability (0 to 1) over all aligned nucleotides in each sequence of the alignment. High posterior probabilities in the range of 0.87 to 1 estimated for the models correspond to good confidence that the aligned nucleotide for each orthologous sequence belongs where it appears in the alignment. Structure of the individual orthologous sequence for both the queries was computed by PMmulti (51) and compared with the corresponding covariance model.
Because each query produced only one high-scoring hit positioned between Srsf3 and Cdkn1a, we argue that the structures may be reasonable. Thermodynamic approach employed in Mfold (52) was used as other method to validate multiple potential structures for mammalian orthologous of lincRNA-p21. Figure 3 shows that the best structural hits were found as short fragments, one at the 5' terminal region (domain A) and another towards the 3' terminal region (domain B) of full-length lincRNA-p21. Because a 5' region of lincRNA-p21 was previously shown to bind hnRNP-K (29), we assumed that the region is likely be conserved in mammals. Therefore, we attempted to identify a structured functional domain in domain A using the two constrains mentioned above.
PseudoViewer program (56) was used to visualize the RNA secondary structure and it showed that the consensus covariance structure for domain A consists of two arcs, where each arch had three hairpin loops containing substructures ( Figure 5A). These two arcs are connected by internal bulge containing central stem ( Figure 5A). Two hairpin loops marked as '1' and '2' in the top substructure were found in all mammals (except in dolphin that lacks loop 2) ( Figure   5B), which indicate that they could be functionally important sub-domains at the 5' region of lincRNA-p21.
Further, we found a single occurrence of a UCCC sequence motif within the 5' conserved domain (domain A) of lincRNA-p21. Notably, hnRNP-K has three RNA binding KH domains (KH1-3). KH domains in general have been shown to have specificity towards UCCC sequence containing motifs in DNA and RNA (75). In the consensus covariance structure, this sequence motif was found in region that has a high probability score to be a single stranded loop ( Figure   S1A). The UCCC motif remains conserved (for sequence and being single stranded) in the PMmulti predicted structures of lincRNA-p21 in all primates ( Figure 5B). Therefore, we postulate that the UCCC containing region in domain A may constitute the binding motif for hnRNP-K. However, whether the flanking sequences have impact on protein binding can be investigated experimentally.
We next used Mfold to predict structures of orthologous sequences of domain A in each mammal (52). Mfold predicted 12 structures in human, 12 in chimpanzee, 12 in gorilla, 10 in rhesus monkey, 7 in cow, 11 in cat, 14 in dolphin, 19 in horse, 17 in mouse and 7 in rat.
Representative structures from each mammal are shown in Figure S2. Notably, the first hairpin (marked as '1' in Figure 5A) was found at the same position in 9 out of 12 predicted structures in human. A similar trend was observed for other mammals. Among them human, chimp, gorilla, rhesus monkey, horse, cat, mouse and rat contain a conserved 'CAUC' tetraloop in the single stranded hairpin loop ( Figure S2) similar to hairpin loop of substructure '1' in the Infernal predicted consensus structure of domain A ( Figure 5A). In contrast, the substructure marked as hairpin loop '2' in Figure 5A was found without any clear consensus substructure in the Mfold predicted structure ( Figure S2).
Similarly, PseudoViewer showed that the consensus covariance structure for domain B consists of an arc with three substructures of stems and loops ( Figure 6A and Figure S1B)).
The stem-loop substructure marked in red oval was found in all animals ( Figure 6B), which indicates that it could contain the functional domain at the 3' terminal region of lincRNA-p21.
As stated previously, this covariance structure was compared with all of the structures predicted by Mfold. Mfold predicted 2 structures in human, 2 in chimpanzee, 2 in gorilla, 5 in rhesus monkey, 5 in cow, 2 in cat, 1 in dolphin, 3 in horse, 8 in mouse and 4 in rat.
Representative structures from each mammal are shown in Figure S3. Notably, the hairpin (red oval in Figure 6A) was found at the same position in the predicted structure in human ( Figure S3). Similar results were obtained from all other animals except cow and horse ( Figure   S3). The loop in this hairpin structure consists of GAAA nucleotide sequence in human, chimp, gorilla, rhesus monkey, cat, dolphin, rat and mouse ( Figure S3).  Figure S4).

Alu repeats exits in human but show poor conservation in mammals
Isoforms of human lincRNA-p21 were found to contain inverted repeat Alu elements (IRAlus) (76). The sense element of these IRAlus is located at positions 2589-2895 and the antisense Alu element is located at positions 1351-1651. These IRAlus were shown to form independent structural domains in the context of full human lincRNA-p21 (76). IRAlus formed by human lincRNA-p21 were found to be important regulator for its cellular localization over the course of the stress response. We searched several vertebrate genomes in the UCSC databases for matches to human lincRNA-p21 IRAlus using BLAT (62). The result depicted in Figure S5 shows that the close matches for both the IRAlus were found only in primates ( Figure S5A and S5B). BLAT searches produced hits in chimpanzee, gorilla (partial) and rhesus monkey in between Srsf3 and Cdkn1a genes. However, no significant hits were found in non-primate mammals and other vertebrates. This is in contrast to conserved domains A and B in the 5' and 3' terminus respectively of lincRNA-p21 that were found to exist in all mammalian orthologs ( Figure S6A and S6B). Therefore, the IRAlu regions seem to have undergone recent evolution in lincRNA-p21 imparting further functions to the RNA.

Conclusions
Since, orthologous sequence of lincRNA-p21 identified using the RNA homology search software Infernal contain sequence mismatch with gaps, we inferred that lincRNA-p21 harbor poorly conserved sequences but considerably conserved structures in ten examined mammals, a feature that is prevalent in other lncRNAs (65). Infernal search found just one high scoring hit in each mammal located between Cdkn1a and Srsf3 genes. Thus, it can be concluded that full length lincRNA-p21 exists only in mammals. Additionally, except for one high scoring hit, several low-scoring hits were produced in many other places in mammalian and other vertebrate genomes, which suggests there may exist other lncRNAs that share similar functional domains with lincRNA-p21. However, the extent of conservation in lncRNAs to preserve the function with simultaneously evolving sequences remains elusive.
Phylogenetic analysis of conserved segments of lincRNA-p21 in ten mammals covering primates, rodents, carnivores and ungulates revealed discrete evolutionary dynamics for orthologous sequences of lincRNA-p21 with different nucleotide substitution rates between clades. Notably, both the domains were found to evolve in a slow rate in primates than in non-primate mammals suggesting for strong functional constrains which may restrict further evolution of lincRNA-p21 conserved domains in primates. Comparison between lincRNA-p21 orthologous sequences of domain A and the exon of Cdkn1a (a protein coding gene) clearly showed that the lincRNA-p21 underwent positive selection and evolved significantly faster than the neighboring Cdkn1a exon. The facts that lincRNA-p21 is not found in non-mammalian vertebrates and has evolved faster than it's nearby genes suggested that it is a recent gene.
Given that most lncRNAs, including Xist and HOTAIR, have so far only been found in mammals, it is interesting question to ask as when and why these lncRNAs emerged in higher vertebrates to mediate genome modifications and other functions.
A comparative computational approach was undertaken to predict the sequence and structure of conserved functional domains of lincRNA-p21 in mammals. PMmulti (51) and Mfold (52) were used to predict multiple potential structures for orthologous of lincRNA-p21 domains in