## Abstract

Experimental studies reveal that genome architecture splits into natural domains suggesting a well-structured genomic architecture, where, for each species, genome populations are integrated by individual mutational variants. Herein, we show that the architecture of population genomes from the same or closed related species can be quantitatively represented in terms of the direct sum of homocyclic abelian groups defined on the genetic code, where populations from the same species lead to the same canonical decomposition into *p*-groups. This finding unveils a new ground for the application of the abelian group theory to genomics and epigenomics, opening new horizons for the study of the biological processes (at genomic scale) and provides new lens for genomic medicine.

## 1 Introduction

The analysis of the genome architecture is one of biggest challenges for the current and future genomics. Current bioinformatic tools make possible faster genome annotation process than some years ago [1]. Current experimental genomic studies suggest that genome architectures must obey specific mathematical biophysics rules [2–5].

Experimental results points to an injective relationship: *DNA sequence* → *3D chromatin architecture* [2,3,5], and failures of DNA repair mechanisms in preserving the integrity of the DNA sequences lead to dysfunctional genomic rearrangements which frequently are reported in several diseases [4]. Hence, some hierarchical logic is inherent to the genetic information system that makes it feasible for mathematical studies. In particular, there exist mathematical biology reasons to analyze the genetic information system as a communication system [6–9].

Under the assumption that current forms of life evolved from simple primordial cells with very simple genomic structure and robust coding apparatus, the genetic code is a fundamental link to the primeval form of live, which played an essential role on the primordial architecture. The genetic code, the set of biochemical rules used by living cells to translate information encoded within genetic material into proteins, sets the basis for our understanding of the mathematical logic inherent to the genetic information system [8,10]. The genetic code is the cornerstone of live on earth. Not a single form of live could evolve or exist, as we currently know it, without the genetic code.

Several genetic code algebraic structures has been introduced to study effect of the quantitative relationship between the coding apparatus and the mutational process on protein-coding regions [11–15]. Formally the genetic code only is limited to translated coding regions where the number of RNA bases is a multiple of 3. However, as suggested in reference [16], the difficulties in prebiotic synthesis of the nucleosides components of RNA (nucleo-base + sugar) and suggested that some of the original bases may not have been the present purines or pyrimidines [17]. Piccirilli et al. [18] demonstrated that the alphabet can in principle be larger. Switzer et al. [19] have shown an enzymatic incorporation of new functionalized bases into RNA and DNA. This expanded the genetic alphabet from 4 to 5 or more letters, which permits new base pairs, and provides RNA molecules with the potential to greatly increase their catalytic power.

It is important to notice that even in the current (*friendly*) environmental conditions not a single cell can survive without a DNA repair enzymatic machinery and that such an enzymatic machinery did not existed at all in the primaeval forms of live. Here, we are confronting the *chicken and egg* problem. To date, the best solution (to our knowledge) is the admission of alternative base-pairs in the primordial DNA alphabet which, as suggested in the studies on the prebiotic chemistry, could contribute to the thermal and general physicochemical stability of the primordial DNA molecules.

Several algebraic structures have been proposed including an additional letter into the DNA alphabet: A, C, G, T. The new letter (D) stands for current insertion deletion/mutations or for alternative wobble base pairing, which would be a relict fingerprint from primordial enzymes derived from a more degenerated ancestral genetic code [16,20,21]. Supporting evidence for the existence of a more degenerated ancestral genetic code built up on a larger alphabet is found in the tRNA anticodon region permitting wobble base pairing by including, e.g., bases such as: inosine (in eukariotes), agmatidine (in archaea), and lysidine (in bacteria), which has been proposed as evolutionary solutions to the need for lower the high translational noise connected to the reading of the AUA and AUG codons [22,23]. Additionally, various alternative base pairs like methylated cytosine and adenine are still present in the current genomes playing an important role in the epigenetic adaptation of organismal populations to the continuous environmental changes [9,24].

Cytosine DNA methylation results from the addition of methyl groups to cytosine C5 residues, and the configuration of methylation within a genome provides trans-generational epigenetic information. These epigenetic modifications can influence the transcriptional activity of the corresponding genes, or maintain genome integrity by repressing transposable elements and affecting long-term gene silencing mechanisms [25,26].

In this scenario, we shall show that all possible DNA molecules and, consequently, genomes can be described by way of finite abelian groups which can be split into the direct sum of homocyclic 2-groups and 5-groups defined on the genetic code. A homocyclic group is a direct sum of cyclic groups of the same order. Any finite abelian group can be decomposed into a direct sum of homocyclic *p*-groups [27], i.e., a group in which the order of every element is a power of a primer number *p*.

The genetic code algebraic structures under scrutiny in the mentioned references covered rings and vector spaces with a common feature, the corresponding additive group is an abelian group of prime-power order. Next, to help a better comprehension of the current work, a brief introductory summary on these groups is provided. Results presented here generalizes the application of the genetic code algebras (reported in several publications) to the whole genome.

### 1.1 Reported genetic code abelian groups relevant for the current study

Herein, we assume that readers are familiar with the definition of abelian group, which otherwise can be found in textbooks and elsewhere including Wikipedia. Nevertheless, all the abelian groups discussed here are isomorphic to the well-known abelian groups of integer module *n*, which are easily apprehended by a college-average educated mind. For example, the abelian group defined on the set {0, 1, 2, 3, 4}, which corresponds to the group of integer modulo 5 , where (2 + 1) mod 5 = 3, (1 + 3) mod 5 = 4, (2 + 3) mod 5 = 0, etc. The subjacent biophysical and biochemical reasonings to define the algebraic operations on the set of DNA bases and on the codon set were given in references [11,13,16].

#### 1.1.1 The – algebras of the genetic code (C_{g})

The – algebras of the genetic code (*C _{g}*) and gene sequences were stated several years ago. In the – algebra

*C*the sum operation, defined on the codon set, is a manner to consecutively obtain all codons from the codon AAC (UUG) in such a way that the genetic code will represent a non-dimensional code scale of amino acids interaction energy in proteins.

_{g}A description of the genetic code abelian finite group (*C _{g}*, +) can be found in [11]. Group (

*C*, +) is isomorphic to the group on the set (the sum of integer modulo 64), which formally will be expressed as . The mapping of the set of codons

_{g}*X*

_{1}

*X*

_{2}

*X*

_{3}∈

*C*into the set is straightforward after consider the bijection A ↔ 0,C ↔ 1,G ↔ 2, U ↔ 3 and the function

_{g}*g*(

*x*) = 4

*X*

_{1}+16

*x*

_{2}+

*x*

_{3}. For example:

The *Z*_{64}-algebra *C _{g}*, however, is limited to protein-coding regions, while it is well known that, in eukaryotes, only a small fraction of the genome –about 3%-called open reading frame (ORF) encodes for proteins [18]. Since non-coding DNA sequences can have a base pairs number not multiple of three, complete chromosomes and genomes cannot be described by means of group (

*C*,+). In addition, natural genomic variations that includes insertions and deletion mutations (indel mutations) across individuals from the same population and close-related populations from different species cannot be represented with group (

_{g}*C*,+).

_{g}#### 1.1.2 The group of the genetic code (C_{g})

Group (*C _{g}*,+) is the additive group of a module over a ring, which however, do not conform to a vector space. To build a genetic code vector space, a Galois field (

*GF*(4) structure in the ordered base set

*B*= {G,U,A,C} was introduced in reference [13]. In particular, an isomorphism with the Galois field is defined by means of its binary representation , i.e. a unique

*GF*(4) up to isomorphism exists, such that a bijection from the DNA base set

*B*= {G, U, A, C} to the set of binary duplets (α

_{1}, α

_{2}) is stated., where , for

*i*∈ {1, 2}. For example, the bijection

*f*is defined as:

The additive group of bases is the Klein four-group, which is defined by the group presentation: *V* = {U,A|U + U = A + A = C + C = G, A + U = C}, i.e., . Next, the abelian group on the set of codons *B*^{3} was defined as the direct third power *B*^{3} = *B* × *B* × *B* of the group (*B*, +), i.e. (*B*^{3},+) = (*B*,+)×(*B*,+)×(*B*,+), which is isomorphic to the group: , i.e., . The sum operation on the set (*B*^{3}, +) follows from the sum operation by coordinates.

As pointed out before by Crick, the first two bases of codons determine the physicochemical properties of aminoacids [28]. The four encoded amino acids of every class are either the same or show very similar physicochemical properties. This genetic code regularity is captured by the quotient group *B*^{3}/*G*_{GGA}, where *G*_{GGA} is a subgroup of *B*^{3} integrated by the elements {GGG,GGA} (the elements of the quotient group *B*^{3}/*G*_{GGA} are given in Table 5 from [13]). The quotient group *B*^{3}/*G*_{GGA} is isomorphic to group . Each element of this group represents an equivalence class of codons. Two triplets *X*_{1}*X*_{2}*X*_{3} and *Y*_{1}*Y*_{2}*Y*_{3} are equivalent if, and only if, the difference *X*_{1}*X*_{2}*X*_{3} + *Y*_{1}*Y*_{2}*Y*_{3} ∈ *G _{DDA}*. In biological terms, substitution mutations involving codons from the same class will not alter (or at least no substantially alter in most of the cases) the physicochemical properties of the encoded protein domains, since in the worst scenario involves aminoacids with very close physicochemical properties, with the exception of codon for aminoacid tryptophan.

#### 1.1.3 The group of the extended genetic code (C_{e})

The extension of the *genetic code group* (*C _{g}*, +) follows straightforward from the extension of the codon set, which is easily accomplished extending the source alphabet of the standard genetic code: {A, C, G, U} and, consequently, extending the base triplet set (extended triplet) as

*X*

_{1}

*X*

_{2}

*X*

_{3},

*X*∈{D, A, C, G, U} [21]. The new algebraic structure (

_{i}*C*, +) is isomorphic to the abelian group defined on the set (the sum of integer modulo 125), formally, . The mapping of the set of codons

_{e}*X*

_{1}

*X*

_{2}

*X*

_{3}∈

*C*into the set is straightforward after consider the bijection

_{e}*D*↔ 0,A ↔ 1,C ↔ 2,G ↔ 3,U ↔ 4 and the function

*g*(

*x*) = 5

*x*

_{1}+ 25

*x*

_{2}+

*x*

_{3}(see Table 1). For example:

#### 1.1.4 The group of the extended genetic code (C_{e})

The Galois field *GF*(5) of the DNA set of bases was introduced in reference [16]. This structure led to the definition of a – vector space over the set isomorphic to the set [16,29]. But here, we are interested only in the abelian groups and . After the bijection *D* ↔ 0,A ↔ 1,C ↔ 2,G ↔ 3,U ↔ 4, the sum operation of two DNA bases follows from the sum operation on the Galois field *GF*(5) (i.e., on , the sum of integers modulo 5). For example, C + U ↔(2 + 4) mod 5 = 1 ↔ A. The sum operation on the set follows from the sum operation by coordinates.

It is worthy to notice that there 24 way to define each one of the above mentioned algebraic structures [29,30]. Nevertheless, for each defined genetic code group, there is only one (genetic code abelian group) up to isomorphism, which lead to their representation as an abelian group, where the sum operation corresponds to the sum of integer modulo *n* ∈ {2,2^{6},5,5^{3}}.

## 2 The General Theoretical Model

Herein, it will be showed that, in a general scenario, the whole genome population from any species or close related species, can be algebraically represented as a direct sum of abelian cyclic groups or more specifically abelian *p*-groups. Basically, we propose the representation of multiple sequence alignments (MSA) of length *N* as the direct sum:

Where *p _{i}* ∈ {2,5,2

^{6},5

^{3}} and

*N*=

*n*

_{1}, +

*n*

_{2}+…+

*n*. Here, we assume the usual definition of direct sum of groups [31]. Let

_{k}*B*(

_{i}*i*∈

*I*= {1,…,

*n*}) be a family of subgroups of

*G*, subject to the following two conditions:

∑

*B*=_{i}*G*. That is,*B*together generates_{i}*G*.For every

*i*∈*I*:*B*⋂ ∑_{i}*B*= 0._{j}

Then, it is said that *G* is the direct sum of its subgroups *B _{i}*, which formally is expressed by the expression: or

*G*=

*B*

_{1}⊕… ⊕

*B*.

_{n}In superior organisms, genomic DNA sequences are integrated by intergenic regions and gene regions. The former are the larger regions, while the later includes the protein-coding regions as subsets. The MSA of DNA and protein-coding sequences reveals allocations of the nucleotide bases and aminoacids into stretched of *strings*. The alignment of these stretched would indicate the presence of substitution, *indel* mutations. As a result, the alignment of a whole chromosome DNA sequences from several individuals from the same or close-related species can be split into well-defined subregions or domains, and each one of them can be represented as homocyclic abelian groups, i.e., a cyclic group of *prime-power* order (Fig. 1). As a result, each DNA sequence is represented as a *N*-dimensional vector with numerical coordinates representing bases and codons.

An intuitive mathematical representation of MSA is implicit in Fig.1, with following observations:

Every DNA sequence from the MSA and every subsequence on it can be represented as a vector with element coordinates defined in some abelian group. For example, , the first five codons from the first DNA sequence from Fig. 1, {ATA, CCC, ATG, GCC, AAC} ∈ (

*C*, +), can be represented by the vector of integers: {48,21,50,25,1} where each coordinate is an element from group (see Table 1 from reference [11] and the introduction section)._{g}Any MSA can be algebraically represented as a symbolic composition of abelian groups each one of them is isomorphic to an abelian group of integers module

*n*. Such a composition can be algebraically represented as a direct sum of homocyclic abelian groups. For example, the multiple sequence alignment from Fig. 1 can be represented by the direct sum of abelian groups:

In more specific scenario, the multiple sequence alignment from Fig. 1 can be represented by the direct sum of abelian 2-groups and 5-groups:

Or strictly as the direct sum of abelian 5-groups:

Although the above *direct sums* of abelian groups provides a useful compact representation of MSA, for application purposes to genomics, we would also consider to use the concept of direct product (*cartesian sum or complete direct sums*) [31]. Next, let *S* be a set of abelian cyclic groups identified in the MSA *M* of length *N* (i.e., every DNA sequence from *M* has *N* bases). Let *ℓ _{i}* the number of bases or triples of bases covered on

*M*by group

*S*∈

_{i}*S*where ∑

*=*

_{i}ℓ_{i}*N*. Hence, each DNA sequence on the

*M*can be represented by a cartesian product (

*b*

_{1},…,

*b*) where

_{n}*b*∈

_{i}*S*(

_{i}*i*= 1,…,

*n*) and

*n*= |

*S*|. Let be a group defined on the set of all elements (0,.,0,

*b*, 0,. 0) where

_{i}*b*∈

_{i}*S*stands on the

_{i}*i*place and 0 everywhere else. It is clear that . In this context, the set of all vectors (

^{th}*b*

_{1},…,

*b*) with equality and addition of vectors defined coordinate-wise becomes a group named direct product (cartesian sum) of groups , i.e.:

_{n}An illustration of the cartesian sum application was given above in observation a).

## 3 Results

Results essentially comprise an application of the fundamental theorem of abelian finite groups [27,31]. By this theorem every finite abelian group *G* is isomorphic to a direct sum of cyclic groups of prime-power order of the form:

Or (in short) , where the *p _{i}*’s are primes (not necessarily distinct), and is the group of integer module . The abelian group representation of the MSA from Fig. 1 given by expressions (1) and (2) correspond to the cases where the finite abelian group

*G*is a direct sum of

*prime-power order*, while expression (3) reflects the fact that any finite abelian group can be decomposed into a direct sum of homocyclic

*p*-groups [27,31], in this

*p*= 5.

As is showed in Fig 1, this abelian group is a heterocyclic group that split into a direct sum of homocyclic *prime-power order*, each one of them split into the direct sum of cyclic *p*-groups with same order. For example, in expression [4] we have the subgroup: , which is a direct sum of 12 homocyclic 5-groups . The case of representation of the genetic code (as given in [11]) is less evident. It follows from the fact that the genetic code table is integrated by 16 subsets of codons with form *K* = {*XYA, XYC, XYG, XY*U}, where *X* ∈ *B* and *Y* ∈ *B* are fixed, the sum operation on each set *K* is defined by coordinates as in the set of bases (*B*, ⊗), and codon *XY*A is taken as identity element. For example, *K*= {CGA, CGC, CGG, CGU} with codon CGA as identity element. In other words, , which corresponds to the Klein four group as defined on .

Notice that for each fixed length *N* we can build manifold heterocyclic groups *S _{i}*, and each one of them can have different decomposition into

*p*-groups. So, each group

*S*could be characterized by means of their corresponding canonical decomposition into

_{i}*p*-groups. This last detail is exemplified in Fig. 2, where an exon region from the enzyme

*phospholipase B domain containing-2*(PLBD2) simultaneously encodes information for several aminoacids and carries the footprint to be targeted by the transcription factor REST. Four possible group representations for this exon subregion are suggested in the top of the figure (panel

**a**). These types of protein-coding regions are called

*duons*, since their base-triplets encode information not only for aminoacids but also for transcription enhancers [32-34].

The group representation is particularly interesting for the analysis of DNA sequence motifs, which typically are highly conserved across the species. As suggested in Fig. 2, there are some subregions of DNA or protein sequences where there are few or not gaps introduced and mostly substitution mutations are found. Such subregions conform blocks that can cover complete DNA sequence motifs targeted by DNA biding proteins like transcription factors, which are identifiable by bioinformatic algorithm like BLAST [36]. Herein, the case of double coding called our attention, where the DNA sequence simultaneously encode information transcription factor targeted sequence motif and the codon sequence encoding for aminoacids. Notice that the abelian group defined on the standard genetic code is enough to quantitatively describe these motifs (Fig. 2). However, a further application of group theory together with additional knowledge on the biological function reveal a more specific decomposition of the motif into abelian groups.

No matter how complex a genomic region might be, it has an abelian group representation. As shown in Fig. 3, two different protein-coding (gene) models from two different genome populations can lead to the same direct sum of abelian *p*-groups and the same final aminoacids sequence (protein). The respective exon regions have different lengths and gaps (“-”, representing base D in the extended genetic code) were added to exons 1 and 2 (from panel **a**) to preserve the reading frame in the group representation (after transcription and splicing gaps are removed). Both gene models, from panel **a** and **b**, however, lead to the same direct sum of abelian *5*-groups:

An example considering changes on the gene-body reading frames as those introduced by alternative splicing is shown in Fig. 4. Gene-bodies with annotated alternative splicing can easily be represented by any of the groups or (Fig.4a). The splicing scenario can include enhancer regions as well (Fig.4b). As commented in the introduction, cytosine DNA methylation is implicitly included in extended base-triple group representation. Typically, methylation analysis of methylomes is addressed to identify methylation changes induced by, for example, environmental changes, lifestyles, age, or diseases. So, in this case the letter D stands for methylated cytosine, since only epigenetic changes are evaluated. A concrete example with two genes from patients with pediatric acute lymphoblastic leukemia (PALL) is presented in Fig. 5.

It is obvious that the MSA from a whole genome population derives from the MSA of every genomic region, from the same or closed related species. At this point, it is worthy to recall that there is not, for example, just one human genome or just one from any other species, but populations of human genomes and genomes populations from other species. Since every genomic region can be represented by the direct sum of abelian cyclic groups of prime-power order, then the whole genome population from individuals from the same or closed related species can be represented as an abelian group, which will be, in turns, the direct sum of abelian cyclic groups of prime-power order.

Hence, results lead us to the representation of whole genomes populations of individuals from the same species or close related species (as suggested in Fig. 1) by means of direct sum of their group representation into abelian cyclic groups. A general illustration of this modelling would be, for example:

That is, the fundamental theorem of abelian finite groups has an equivalent in genomics. **Theorem 1**. The genomic architecture from a genome population can be quantitatively represented as an abelian group isomorphic to a direct sum of cyclic groups of prime-power order.

The proof of this theorem is self-evident across the discussion and examples presented here. Basically, the group representations of the genetic code lead to the group representations of local genomic domains in terms of cyclic groups of prime-power order, for example, or , till covering the whole genome. As for any finite abelian group, the abelian group representation of genome populations can be expressed in terms a direct sum of abelian cyclic groups of prime-power order. Any new discovering on the annotation of given genome population will only split an abelian group, already defined on some genomic domain/region, into the direct sum of abelian subgroups ■.

## 4 Discussions

Under the assumption that the current forms of life are the result of an evolutionary process started from very simple primordial cells, the current non-coding DNA must be the relict footprint of multiple recombination of ancient DNA domains in all the permissible forms, which in ancient times were rules by an ancient genetic code. In consequence, on this scenario, the group representations of the genetic code are logically extended from relatively small local DNA domains to the whole genome.

Examples shown in Fig. 1 to 4 indicates whatever would be the genomic architecture for given species, the observed variations in the individual populations and in populations from closed related species, it can be quantitatively described as the direct sum of abelian cyclic groups. The discovering/annotation of new genomic features will only lead to the decomposition of previous known abelian cyclic groups representing some genomic subregion into direct sums of subgroups. In such algebraic representation DNA sequence motifs for which only substitution mutations happened are specifically represented by the abelian group , in protein coding regions, and by any or combination of groups or in non-protein coding regions.

Results indicate that the genome architecture of whole populations can be quantitively studied in the framework of abelian group theory. Two sets of MSA, *S _{1}* and

*S*, could split into different cyclic groups and, however, these sequences can be isomorphic between them because have the same canonical decomposition. Particularly, the genetic code abelian group is enough for an algebraic representation of the genome population from the same species or close related species. However, such a decomposition is biologically poor and, as suggested in Figs. 4 to 5, masks relevant biological features from the genome architecture. A further decomposition into the direct sum of abelian groups will only depends on our knowledge on the genome annotation for specified species.

_{2}As suggested in Figs. 3 and 4, base D from the extended genetic code (represented as gaps in the MSA) results useful preserving the information on the natural reading frame in the abelian group representation. It is worthy to notices that, for the transcriptional and splicing enzymatic machinery, the information on the reading frame preservation is already in the sequence. Molecular machines perform precise logical operations [38], which in this case result in a sort of molecular *enthymeme* (logical) operation where the conclusion is omitted obeying the principle of cellular economy. In other words, in the algebraic representation of gene and genome populations base D carries real biological information.

From several examples provided here, it is clear that there exists a language for the genome architecture that can be represented in terms of sums of finite abelian groups. The future developments of genome annotation from several species can certainly lead to the discovery of logical rules of a such language determining the possible viable variations in the populations. As suggested in reference [13], the identification of quotient groups (at larger scale) can permit the stratification of large genome population into equivalence classes corresponding to individual subpopulations, each one of them carrying particular viable variations of species genome architecture. An illustration on a very simplified example is given in Fig. 3, where the extended base CCD (CC-, potentially encoding for aminoacid proline, P) and codon CCT (CCU, encoding for P) belong to the same equivalent class from the extended genetic code shown in Table 1. In other words, the fact that DNA sequence motifs, domains, genes, chromosome and whole genomes can be algebraically split into sets of equivalence classes gives birth to a new level in the current biological taxonomy, which we would call *genomic algebraical taxonomy of species*.

As indicated in reference [11], natural genomic rearrangement like DNA recombination and translocation at structural and functional domain can be represented as group automorphisms and endomorphisms. Biologically, such description corresponds to the fact that the new genetic information is recreated, simply, by way of reorganization of the genetic material in the chromosomes of living organisms [4,39]. The analysis and discussion on the application of the endomorphism ring theory to describe the dynamics of genome population is a promising subject for further studies.

Particularly promising is the application of the genomic abelian groups on epigenomic studies, which results when base D stands for the methylated cytosine. As suggested in Fig.5, a precise decomposition of methylation motif into the direct sum of abelian finite group can leads to their classification into unambiguous equivalence classes. This open the doors for the application of based machine-learning bioinformatic approaches to study the methylation changes induced on individual populations by, e.g., environmental changes, aging process and diseases, which is of particular interest in genomic medicine [40].

Results presented here would have considerable positive impact on current molecular evolutionary biology, which heavily relies on evolutionary null hypotheses about the past. As suggested in reference [29], the genomic abelian groups open new horizons for the study of the molecular evolutionary stochastic processes (at genomic scale) and with relevant biomedical applications, founded on a deterministic ground, which only depends on the physicochemical properties of DNA bases and aminoacids. In this case, the only molecular evolutionary hypothesis needed about the past is a fact, the existence of the genetic code.

## 5 Conclusions

Results to date indicate that the genetic code and, ultimately, the physicochemical properties of DNA bases on which the genetic code algebraic structure are defined, has a deterministic effect or at least partially rules on the current genome architectures, in such a way that the abelian group representations of the genetic code are logically extended to the whole genome. In consequence, the fundamental theorem of abelian finite groups can be applied to the whole genome. This result opens new horizons for further genomics studies with the application of the abelian group theory, which currently is well developed and well documented [31,41].

Results suggest that the architecture of current population genomes is quite far from randomness and obeys deterministic rules. Although the random nature of the mutational process, only a small fraction of mutations is fixed in genomic populations. In particular, fixation events are ruled by the genetic code architecture, which as shown by Sanchez (2018), it can be simulated as an optimization process by using genetic algorithms [29]. This points to the study of the dynamics of genome populations as a stochastic deterministic process. Genome stochasticity derives from the stochasticity of mutational process and from the stochasticity of biochemical reactions, which gives rise to a rich population diversity and phenotypic plasticity that help to prevent population extinction. The deterministic part derives from its architecture, which can be represented in terms of a canonical direct sum of homocyclic abelian groups derived from the genetic code, hold for all the individuals from the same population/species.

## Footnotes

1. To differentiate equation numbering style from reference numbering. 2. To fix an incomplete sentence 3. Add figures next paragraph where are first time mentioned. 4. To improve some discussion.