Multifractal Analysis of SARS-CoV-2 Coronavirus genomes using the wavelet transform

In this paper, the 1D Wavelet Transform Modulus Maxima lines (WTMM) method is used to investigate the Long-Range Correlation (LRC) and to estimate the so-called Hurst exponent of 21 isolate RNA sequence downloaded from the NCBI database of patients infected by SARS-CoV-2, Coronavirus, the Knucleotidic, Purine, Pyramidine, Ameno, Keto and GC DNA coding are used. Obtained results show the LRC character in the most sequences; except some sequences where the anti-correlated or the Classical Brownian motion character is observed, demonstrating that the SARS-Cov2 coronavirus undergoes mutation from a country to another or in the same country, they reveals also the complexity and the heterogeneous genome structure organization far from the equilibrium and the self-organization.


Introduction
Severe Acute Respiratory Syndrome SARS-CoV-2 is a member of the Coronaviridae family causes an illness called COVID-19, which can spread from person to person (Conway, 2020). It has many symptoms such as fever, headache, and tiredness. It causes respiratory difficulties that can cause death, especially for people which health chronic difficulties such as diabetes, arterial hypertension, heart and pneumonic illness. Until now, there is no proven anti-viral or vaccination for the SARS-CoV-2 virus.
Fractal character of nucleic acids distribution in DNA sequences has been widely studied by the scientific community; many papers have been published in literature. Arneodo et al (1996) published a paper deals with the study of the Long-Range Correlation (LRC) character of DNA sequences using the 1D continuous wavelet transform method. Zu-Guo et al (2002) introduced a time series model in a statistical point of view and a visual representation in a geometrical point of view to DNA sequence analysis, they have also used fractal dimension, correlation dimension, the Hurst exponent and the dimension spectrum (multifractal analysis) to discuss problems in this field. Cattani (2010) published a paper deals with the digital complex representation of a DNA sequences and the analysis of existing correlations by wavelets. The symbolic DNA sequence is mapped into a nonlinear time series. By studying this time series the existence of fractal shapes and symmetries will be shown. Eight H1N1 DNA sequences from different locations over the world are analyzed.  studied the Long-Range Correlations in Genomic DNA and the signature of the Nucleosomal Structure. Audit et al (2004) published a paper deals with wavelet Analysis of DNA bending profiles reveals structural constraints on the evolution of genomic sequences, Voss (1996) published a paper deals with the evolution of Long-Range fractal correlations and 1/f noise in DNA base sequences.
In this paper the 1D Wavelet Transform Modulus Maxima Lines (WTMM) method is used to demonstrate the monofractal behavior of SARS-CoV-2 RNA sequences downloaded from the NCBI database and to estimate the so-called Hurst exponent, the goal is to investigate the LRC character in these sequences. We begin by describing the different DNA coding that will be used.

Different DNA coding
Many DNA coding of the nucleic acids formed by four different nucleotide which are the Thymine (T), the Guanine (G), the Cytosine (C), the Adenine (A) have been proposed in literature, here we will use the following six coding (Messaoudi et al, 2012): The Knucleotidic DNA coding: T=2, G=-2, A=1, C=-1.

Wavelet Transform Modulus Maxima lines and LRC in DNA sequences
The 1D  For more details about the 1D WTMM method we invite readers to the paper of Arneodo et al (1996) or Ouadfeul and Aliouane (2011).
One of the most important strengths of the WTMM method is the ability to identify the mono or the multifractal behavior of a given fractal process, the linear shape of the spectrum of exponents is enough to say that a given fractal process is monofractal ‫ܪ(‬ is the Hurst exponent). For more details about this ability, we invite readers to the papers of Ouadfeul and Aliouane (2011). Audit et al (2004) showed that there has been intense discussion about the existence, the nature and the origin of LRC in DNA sequences in last decades. Besides Fourier and autocorrelation analysis, different techniques including mutual information functions, DNA walk representation, Zipf analysis and entropies were used for statistical analysis of DNA sequences. Actually, there were many objective reasons for this somehow controversial situation. Most of the investigations of LRC in DNA sequences were performed using different techniques that all had their own advantages and limitations. They all consisted in measuring power-law behavior of some characteristic quantity, e.g., the fractal dimension of the DNA walk, the scaling exponent of the correlation function or the power-law exponent of the power spectrum. Therefore, in practice, they all faced the same difficulties, namely finite size effects due to the finiteness of the sequence. Authors of this paper demonstrated the necessity of the wavelet transform microscope to study the LRC character of DNA sequences.
Estimated Hurst exponent H of the DNA walks using the wavelet transform method is able to say that a given DNA walk is an anti-correlated random walk (H <1/2: anti-persistent random walk), or positively correlated (H > 1/2: persistent random walk). H = 1/2 corresponds to classical Brownian motion (Audit et al, 2004).

Data analysis and results discussion
In this part, 21 isolate RNA sequence are analyzed using the 1D WTMM method, these sequences are extracted from 21 GenBank downloaded from the National Center for Biotechnology Information (NCBI) database, All these RNA sequences are of peoples infected by SARS-CoV-2 Coronavirus, table 01 shows the code of each GenBank and the origin (Country) of each patient infected by this virus. These RNA profiles are coded using the six coding methods detailed above. Then, the 1D WTMM analysis is applied to these sequences to estimate the so-called Holder exponent for each coded DNA profile. Figure 01 shows the RNA Knucleotidic coding with 512bp as a length, the DNA walk of this sequence is presented in figure 02, the DNA walk at the position n is defined as the sum For more details about DNA Walk, we invite readers to the paper of Peng (1992).
To demonstrate the fractal behavior of this DNA walk, we calculate the spectral density of this sequence and we present the spectral density versus the frequency in the log-log scale (Voss, 1992).It is clear that log(|ܵሺ݂ሻ| ଶ ሻ has a linear shape (see figure 3), which demonstrating the scale-law behavior of the spectral density versus the frequency (Arneodo et al, 1995).
The first step is the Continuous Wavelet Transform calculation, the analyzing wavelet is the Complex Morlet, for more details about the CWT calculation, parameters of the analyzing wavelet, and the scaling method, we invite readers to the paper of Ouadfeul and Aliouane (2011).     Table 02 shows the average value of the Hurst exponent for each coding method of the 21 RNA GenBank, we can observe that the Purine (sensitive to A and G concentrations) and the Pyrimidine (sensitive to C and T concentrations) have the smallest Hurst exponent variation compared to Ameno (sensitive to A and C concentrations) and GC coding which have the highest variation of the Hurst exponent.

Conclusions
We have performed a 1D wavelet based multifractal analysis of 21 RNA profiles downloaded from the NCBI database using the continuous wavelet transform, the analyzing wavelet is the complex Morlet, the analyzing parameters of the wavelet transform modulus maxima lines