A general decrease of normalized ncDNA in evolution

Complexity is often associated with increasing non-coding DNA (ncDNA). For example, the human genome is in its vast majority ncDNA. Here, it is hypothesized that normalized ncDNA (nncDNA) has in fact diminished in evolution. This definition of ncDNA content takes into consideration total proteomic content. It is shown that by reducing their normalized ncDNA, organisms may have obtained more complexity in evolution. Also, a potential connection between ncDNA, proteome information and chromatin interactions in mice and humans is presented.


Introduction
Life comprises a large number of forms; all of which composed of one or more cells [1]. In these, one or several DNA molecules carry the genetic information. DNA therefore controls cellular activities, and is transmitted across generations [1] [2].
Initially, the quantity of DNA was thought to determine complexity in organisms [3]. Simpler organisms would have less cellular DNA than more complex ones. However, it soon became clear that it was not the case. Neither the total amount of DNA nor the number genes seem to correlate accurately with complexity. This has been termed the Cparadox [3] and the G paradox [4], respectively.
A more successful approach to account for complexity consists of noncoding DNA (ncDNA). ncDNA is defined as DNA that does not encode proteins [5]. The percentage of ncDNA in the genome has been reported as a valid predictor of complexity [6] [7]. In this way, it is not large amounts of DNA or number of genes, but rather large ncDNA percentages, what seems to characterize complex organisms [6] [7].
Interestingly, ncDNA comprises i) identified regulatory sequences, but especially ii) large regions of unknown function. The latter are primarily transposonderived DNA; i.e DNA created by molecular parasites known as transposable elements. As they lack of apparent function, these regions are often referred as junk DNA [8]. The fact that genomes of advanced species are mainly ncDNA is thus intriguing. So, even though ncDNA is an effective indicator of complexity, the reason for this remains unclear.
Another more intuitive predictor of complexity can be suggested. In this case, it is the total number of different proteins that an organism can display -referred here as Proteome Information Units (PIUS). The referred amount does not correspond to the number of genes, as alternative splicing and posttranslational modifications allow genes to code for multiple proteins. Consequently, PIUS take into consideration the standard set of proteins plus their isoforms.
In the present work, the relation between these two complexity indicators ncDNA percentage and Proteome Information Units is assessed. The ratio, named here normalized ncDNA (nncDNA), reveals a characteristic behavior which could provide insights into evolution.

Results
A general decrease in normalized ncDNA: Previous studies [6][7] were used to extract information about ncDNA content for multiple species. The focus was not on the total amount of kilobases of ncDNA, but on ncDNA percentages in relation to the whole genome.
On the other hand, Uniprot [9] served as a source of proteomic data. Particularly, reference proteomes were employed. Reference proteomes constitute a series of proteomes selected to represent biological diversity [9]. Uniprot provides a general overview of the data in terms of i) number of proteins of the haploid genome, ii) number of isoforms and iii) mappings. All of these elements are indicated for each species. Here, the elements i) and ii) were added in each case to calculate the PIUS values as discussed in the introduction.
Once the ncDNA data and proteomic data were collected, the next step was to select the organisms for which both types of information was available. The results include a total of 26 species, ranging from bacteria to mammals. Table 1 contains for each organism its correspondent i) ncDNA percentage, ii) PIUS, iii) ncDNA percentage • PIUS⁻¹ ratio (i.e nncDNA) and iv) an identifying letter.
When plotting the data in Table 1 a characteristic graph is obtained (Figure 1). Notably, a direct correlation is not found, but rather organisms with higher PIUS (green) seem to have lower ncDNA percentages than expected. Also, organisms in the other phase of the graph (purple) belong to more primitive phylums. This can be explained by a reduction in nncDNA values in more advanced species. Examination of Figure 1 also reveals a possible limitation on PIUS complexity based on the relation between ncDNA percentage and PIUS. As PIUS values increase, normally so do ncDNA percentages. Since ncDNA percentages cannot logically exceed 100%, there is a limitation on the theoretical maximum PIUS.
Nevertheless, the ratio between ncDNA percentage and PIUS -i.e nncDNA is shown to have presumably decreased in evolution. In this manner, more advanced organisms would have achieved greater PIUS by reducing their nncDNA values.
Exploring the role of chromatin interactions: One of the factors relating ncDNA percentages and PIUS may be regulation of gene expression. At the same time, chromatin interactions are known to regulate gene expression [10]. Therefore, these three elements (ncDNA, PIUS and chromatin interactions) could be connected. To study this possibility, information from the 4DGenome database [11] was retrieved. Particularly, Mus musculus and Homo sapiens data was selected, as these species are closely related and share most gene regulation mechanisms [12].
The results show that, even though total ncDNA, PIUS and number of chromatin interactions vary substantially from mice to humans, their proportion (k) is approximately constant. This is shown in Table 2: Table 2. Relation between total ncDNA, PIUS and total chromatin interactions. The parameter k corresponds to the ratio Total ncDNA (b.p) • (PIUS • Total chromatin interactions) ¹ ⁻ .

Discussion
In this work, it is shown that normalized ncDNA has generally decreased in evolution. This is in contrast with ncDNA percentages, where organismal complexity and ncDNA percentages are positively correlated. The advantage of normalized ncDNA over ncDNA percentages is that it includes another complexity indicator, in this case Proteome Information Units.
nncDNA may not be used to measure biological complexity -ncDNA percentages or PIUS are better indicators but rather to gauge efficiency in regulation. Since ncDNA contains regulatory DNA, lower nncDNA values indicate more efficient control per Proteome Information Unit. In addition, lower nncDNAs allow higher theoretical maximum PIUS values. This is in agreement with the observed trend of nncDNA decrease in evolution. However, data (genomic and proteomic) from more species is necessary to confirm these results.
Finally, an empirical constant between ncDNA, PIUS and chromatin interactions is presented for humans and mice. In this manner, PIUS and chromatin interactions seem to be inversely proportional. Information from other related species is also needed to further prove this association.