Evolutionary patterns and heterogeneity of Dengue Virus serotypes in Pakistan

Comprehensive and systematic examination of Dengue virus (DENV) evolution is essential in the context of Pakistan as the virus presents a significant public health challenge with the ability to adapt and evolve. To shed light on intricate evolutionary patterns of all four DENV serotypes, we analyzed complete genome sequences (n=43) and envelope (E) gene sequences (n=44) of all four DENV serotypes collected in Pakistan from 1994 to 2023 providing a holistic view of their genetic evolution. Our findings revealed that all four serotypes of DENV co-circulate in Pakistan with a close evolutionary relationship between DENV-1 and DENV-3. Genetically distinct serotypes DENV-2 and DENV-4 indicate that DENV–4 stands out as the most genetically different, while DENV-2 exhibits greater complexity due to the presence of multiple genotypes and the possibility of temporal fluctuations in genotype prevalence. Selective pressure analysis in Envelope (E) gene revealed heterogeneity among sequences (n=44) highlighting 46 codons in the genome experiencing selective pressure, characterized by a bias towards balancing selection indicating genetic stability of the virus. Furthermore, our study suggested an intriguing evolutionary shift of DENV-4 towards the DENV-2 clade, potentially influenced by antibodies with cross-reactivity to multiple serotypes providing a critical insight into the complex factors shaping DENV evolution and contributing to the emergence of new serotype. Author Summary The emergence of the fifth serotype of dengue virus heightened our interest in investigating its presence in Pakistan. In our quest to understand the evolving landscape of dengue in Pakistan, we conducted a comprehensive analysis, comparing whole genome sequences and E gene sequences. Notably, we focused on the E gene recognized as the most mutable component and a key determinant of dengue’s virulence. The phylogenetic analysis unveiled fascinating findings, demonstrating a strong genetic affinity between serotypes 1 and 3. Substantially signifying its implications for vaccine development and understanding of cross-immunity dynamics within serotypes. We delved into the genetic dynamics of dengue by subjecting the whole genome of DENV and E gene to neutrality tests. The outcomes of these tests unveiled a critical aspect of dengue virus evolution: the genome is not evolving neutrally. Instead, the E gene experiences selective pressure, indicating a bias towards balancing selection. The finding underscores the complex interplay of factors shaping the genetic diversity of dengue in Pakistan and provides valuable insights into the virus’s adaptive strategies.


Introduction
Dengue infection is prevalent among humans and is primarily found in regions characterized by the tropical and sub-tropical climates.It is estimated that there are approximately 400 million new infections each year on a global scale.The illness caused by dengue fever encompasses a spectrum of manifestations ranging from an acute febrile condition known as dengue fever to dengue hemorrhagic fever or dengue shock syndrome which is a more severe and potentially life-threatening form.The single stranded positive RNA virus behind dengue infection belongs to virus Flaviviridae having a genome size of 11 kb and encodes a polyprotein with a mutation rate of more than 100 times greater than mutation rates of DNA genomes [1].
Neutral theory of molecular evolution states the genetic differences between species and within a population are two sides of the same coin.Because, a large number of evolutionary substitutions that manifest as species differences are mostly neutral, meaning that a specie could accept these without having a substantial adverse impact on its capacity to survive.These variations are termed as evolutionary permissible variations.It is possible to deduce evolutionary permissibility of every potential variant by using multispecies sequence alignment and sophisticated statistical techniques on the evolutionary tree of species.However, evolutionary forbidden variants in the genome may impart an unappreciated role in protection against disease and evolution of new genomes from previous ones.The infection of DENV begins with host cell adhesion, entrance and virus envelope fusion with cellular endosomal membranes carried out by the Class II fusion protein known as the flavivirus E protein.E gene is regarded as a molecular marker of DENV pathogenicity because E protein's adhesion to the target cell containing heparin sulfate is necessary to initiate DENV infection.The genomic sequence of E gene also differs in pathogenic and non-pathogenic strains.Therefore, the antibodies that successfully inactivate one serotype may not work against another due to the structural differences in the E gene [2].
Our study represents a pioneering effort in Pakistan, as it provides the first comprehensive examination of the dengue virus in the region, focusing on its evolutionary landscape.By assembling a dataset comprising a total of 43 complete DENV genomes from serotype 1 to 4 and 46 E gene sequences, we have created the extensive collection of DENV data from Pakistan to date.The phylogenetic analysis unveiled a compelling relationship between DENV-2 and DENV-4.However, it is crucial to recognize the co-circulation of multiple DENV serotypes coupled with heightened human activity, elevated the risk of genetic changes that contribute to the diversity within virus populations.The study also highlights the role of natural selection and genetic bottlenecks in the emergence of new dengue virus serotypes, further emphasizing the significance of ongoing monitoring and research in combating this significant public health threat.DENV Envelope (E) gene selection analysis indicated the existence of diversifying and purifying selection along with two recombination breakpoints that indicates a departure from strict neutrality and the presence of selective forces acting upon viral sequences.

DENV Evolutionary Analysis
Multiple sequence alignment of whole genome sequences testified the 100% conservation of 5ꞌ and 3ꞌ cyclization sequences in all 43 sequences.

Departure from neutrality
The Fisher's Exact test of neutrality revealed a significant departure from strict neutrality, with likelihood values supporting the alternative hypothesis of positive selection for sequence pairs.Similarly, the Codon based Z test indicated significant departures from neutrality with P values less than 0.05, underscoring selective pressure acting on these sequences.Furthermore, the examination of non-synonymous and synonymous substitutions per site unveiled insights into specific selective pressure at play.The employment of the Discrete Gamma Distribution, with a shape parameter of 0.3926, effectively categorized evolutionary rates into five distinct groups.
These rates ranged from 0.01 to 3.50 substitutions per site, providing a detailed view of the varying evolutionary dynamics across different sequence positions.Regarding nucleotide composition, the frequencies exhibited by sequences were A (32.48%), T/U (21.19%),C (20.61%) and G (25.71%).The data strongly support the 3H model (allowing for three-nucleotide substitutions) as the best model for explaining the substitution patterns in your dataset.This suggests that multiplenucleotide substitutions are occurring in our data, and these substitutions significantly improve the model fit compared to models that allow only one-nucleotide substitutions (1H) or two-nucleotide substitutions (2H).

Discussion
Dengue infection was believed to be caused by antigenically four diverse serotypes, DENV Consequently, there is an urgent need for Pakistan to consider the adoption of the dengue vaccine.
The shared genetic similarity may necessitate adjustments in vaccine formulations to ensure broader coverage against both serotypes.It could also have implications for cross-immunity and disease severity.If individuals exposed to one serotype develop partial immunity to another due to genetic similarity, it may affect the pattern of disease spread and severity of infection.This information is crucial for healthcare providers and policymakers to anticipate and manage DENV outbreaks [4].A study was conducted to analyze genetic diversity present within the E gene of DENV-3 serotype so that prevalent amino acid modifications and variability in E protein could be observed [5].
Nucleotide substitution patterns in the genome can be analyzed by Transition/Transversion bias and the R value of 1.94 suggested a moderate bias towards transitions over transversions in the E gene.Similarly, evolutionary analysis of DENV-4 E gene was conducted to observe evolutionary patterns and despite abundant non-synonymus variations, no evidence was found for adaptive evolution in any codon or gene.The combination of a high number of sites (segregating), moderate diversity of nucleotides (π), and a Tajima's D value greater than zero indicates complex evolutionary dynamics at play in these 43 sequences.These results prompt further investigation into the specific factors driving genetic diversity and the potential role of selection in shaping the genetic landscape of these sequences.Such moderate genetic diversity can be indicative of a complex evolutionary history, including both ancestral genetic variation and more recent evolutionary events [6].FEL and SLAC determined sites of codons in the E gene alignment data under selection [7].59 sites highlighted by FEL provide a valuable insight into how natural selection has shaped genetic diversity at these specific sites within the DENV E gene.aBSREL found 12 branches to be under episodic positive selection.The role of episodic positive selection has also been observed in VP1 and VP3 of foot and mouth disease virus [8].The results provide valuable insights into the complex forces shaping genetic diversity within the analyzed lineages.
Episodic diversifying pressure was also used to find codon sites in ORF of HoBi like pestivirus under selection [9].The FUBAR analysis has revealed two important aspects of selection within the dataset: Pervasive Positive/Diversifying Selection: evidence of pervasive diversifying selection was found at 18 specific sites.This suggests that these 18 sites have experienced a consistent trend of favoring genetic variations that lead to changes in the amino acid sequence, potentially due to environmental or functional advantages.Such sites may be adapting to new conditions or evolving to perform specific functions more effectively.The analysis identified pervasive negative selection at 50 sites.Purifying selection is often associated with the preservation of critical protein functions or structural integrity.Mutations that negatively impact fitness are removed or suppressed, leading to the persistence of the original genetic code.The LRT comparing the 3H model to the 1H model has a very high LRT value, showing that the 3H model provides a better fit to the data in a significant manner compared to the 1H model [10].The identification of only two recombination breakpoints in the DENV 1-4 serotypes from Pakistan, raises intriguing questions about the underlying mechanisms driving genetic diversity in these viruses [11].Flaviviruses are known for their ability to undergo occasional recombinations, and the scarcity of such events in the DENV genome, as observed here, aligns with the virus's evolutionary strategy.It appears that DENV has developed mechanisms to predominantly rely on mutations generated during replication and interactions with the host immune system, as opposed to frequent recombination events.Recombinational hotspots were also investigated in PSV2020 strain of Picornavirus [12].
The results of our study revealed 18 codons to be under diversifying selection More no. of purifying codons also supports the results of Tajima's D test, indicating the DENV genome experiencing balancing selection.Similarly, a major role of negative selection was also observed in the HoBi like pestivirus genome in shaping its evolution [9].Remarkable

Data collection Dataset 1
Retrieved 448 sequences of Pakistani Dengue virus isolates from NCBI and ViPR database [13].The sequences encompassed a range of genetic data, comprising complete genome sequences.We exclusively considered sequences that came with associated location information and collection dates.Among 448 sequences, the repeated sequences with same accession no. and size less than 500 bp were removed.Most of the sequences were collected over a span of years from 1994 to 2023 and a total of 43 complete genome sequences were further analyzed.DENV-1 (7 sequences), DENV-2 (24 sequences), DENV-3 (10 sequences) and DENV-4 (2 sequences).
Data was composed of all four serotypes.Reference sequences for all four serotypes were included in the study.Due to unavailability of current serotypes data from 2023, 6 recently reported sequences were included from outside Pakistan to have a clear picture of recent mutational patterns in dengue serotypes.After preparing four separate files containing serotype specific data, concatenated all files for Multiple Sequence Alignment (MSA).MUSCLE alignment was run on MEGA-X [14].

Dataset 2
We acquired all Pakistani dengue virus E gene nucleotide sequences documented till now from the ViPR database.This dataset encompassed 46 E protein sequences from DENV-1 to DENV-4.

MSA and phylogenetic analysis
Dataset 1 comprising a total 43 complete genome sequences of all four serotypes from Pakistan reported till date along with four reference sequences of dengue virus and 6 sequences other than from Pakistan were retrieved from the same databases mentioned earlier to have a clear understanding of mutational patterns in E gene (data available in supplementary file).The MUSCLE results revealed that every sequence from one serotype had a high degree of similarity to reference sequences of the same serotype.Furthermore, conserved and non-conserved regions in the entire genome were also manually screened.Sequence alignment was also performed for the E gene using the same software.

Tamura-Nei model and Maximum Likelihood (ML)
Model for tree construction in MEGA-X was selected by Akaike information criterion (AIC) [14].The Tamura-Nei model and Maximum Likelihood (ML) approach was used to infer evolutionary history [15].Bootstrap consensus tree generated from 1000 replicates was used to analyze evolutionary history of taxa.And the branches associated with partitions that had less than 50% of the bootstrap replicates collapsed.In the Bootstrap test with 1000 replicates, the proportion of replicated trees in which linked taxa clustered together was displayed next to branches [16].The initial tree for heuristic search was created by automatically applying Neighbor-join and BioNJ algorithms to a matrix of pairwise distance calculated using Tamura-Nei model and then choosing the topology with highest log likelihood value.43 nucleotide sequences were included for the complete genome phylogenetic analysis.The final dataset contained a total of 10817 positions.
Phylogenetic analysis was also performed to see evolutionary linkages between E genes of DENV serotypes utilizing the same strategy.

Statistical analysis to predict Neutrality
Complete genome sequences were used to run Fisher's exact test to calculate chances of evidence that the serotypes are evolving in a way that's not random.For each pair of sequences in the datasets A, we looked at the chances of rejecting the idea that they are evolving neutrally and favoring the idea that they are being positively selected.p-value < 0.05 was significant.The variances were estimated by the bootstrap method.
The codon-based Z test was utilized to assess neutrality between sequences with significance considered when P values were less than 0.05 (at the 5% level).This test relies on dS and dN to refer to the number of synonymous and non-synonymous substitutions per site.To analyze evolutionary patterns.The Nei-Gojobori method [17].was employed for these analyses, and pairwise deletion option was used to remove ambiguous positions for each sequence pair, yielding a final dataset with 3275 positions.
To estimate Gamma parameters for site rates, we utilized discrete Gamma distribution with a shape parameter of 0.3926.Tamura-Nei model featuring a gamma distribution was used to determine the patterns and rates of substitution (+G) [18].The gamma distribution comprising 5 categories were incorporated to account for variations in evolutionary rates among sites.The estimated ML values and the automatically computed tree topology produced a maximum-log likelihood of -65617.091.We analyzed the position of every codon including first, second, third and non-coding ones, yielding a final dataset of 10,817 positions.
The analysis considered all codon positions, including the 1st, 2nd, 3rd, and noncoding positions.
For each set of sequences, the ambiguous positions were eliminated using paired deletion to assure accuracy of results.The total number of positions in the final dataset were 10,817 and the evolutionary analysis was performed on MEGA 11.Mathematical representation of the test is given below

Inter-sequence Substitution patterns
To determine if the entire genome sequences were evolving with a similar pattern of substitution or showed noticeable variances in base composition biases, the Disparity index test was used to test the homogeneity of substitutions among sequences.In order to access patterns of substitutions, the non-coding positions first, second, third and non-coding were taken into account resulting in a dataset of 10,817 positions.To access p-values, a Monte Carlo test with 500 repeats was used, with results below 0.05 were of significance.Tamura-Nei (1993)

Selective pressure analysis
The ratio of dN to dS in DENV E-gene sequences were determined using different statistical methods; SLAC, FEL, MEME and FUBAR.We utilized SLAC and FEL both implemented in the HyPhy package, and the sites with p-value <0.1 were significantly considered.
For FEL the statistical significance was determined by assessing the results against the asymptotic χ2 (chi-squared) distribution.This analysis considered variations in synonymous (silent) mutation rates between different sites in the dataset.The Nucleotide GTR model was selected for analysis.
FEL is often used to identify sites in a genetic sequence that may be evolving under positive or negative selection.MEME analysis was performed to find the diversifying selection at various points during evolutionary history.The statistical significance of this discovery was based on a pvalue threshold of 0.1.We conducted MEME analysis to find the diversifying selection at various points during evolutionary history.An evolutionary model based on LRT; Likelihood Ratio Test was used for analysis.At particular codons, we searched for instances of positive selection considering it significant when p-value was less than 0.1.We used the Bayes empirical Bayes approach to identify lineages where diversifying selection took place at a specific codon.Branch types that had codons with Bayes factor greater than 1 were known to be subject to episodic diversifying selection.The dN/dS ratio was divided into three groups for BSR analysis using the MG94 model.If the corrected p-value dropped below 0.1, the null model was deemed to be rejected using the BSR approach.For aBSREL, 70 branches were explicitly evaluated for diversifying selection.The LRT was used to determine significance with a p-value threshold of 0.05.FUBAR in the HyPhy package was used to find purifying and diversifying selection sites with a subsequent likelihood of 0.9.BUSTED v4.1 analysis was performed on the E gene alignment using HyPhy (v2.5.48) integrated in datamonkey.The analysis included site to site synonymous rate variation by setting evidence ratio threshold 10.Using the HyPhy package, a multi-hit analysis was performed to determine the potential impact of additional base hits in our data.
Fig 1 represents the GC content of DENV-2 reference genome and its complete genomic representation.

Fig 1 .
Fig 1. Complete Genomic representation of DENV-2 reference genome with accession no.(NC_001474.2).Percentage of GC content is shown in the entire genome where red represents nucleotides with more than 57 GC content.Orange color shows 50-57%, green color corresponds to 41-49%, light blue color shows 33-40% and dark blue color represents less than 33% GC content in nucleotides.(created in SnapGene)

E
gene phylogenetic analysis suggested a common ancestor and close relationship between DENV-1 and DENV-3 forming a distinct clade, while DENV-2 and DENV-4 are more genetically divergent from other two serotypes indicated by their higher genetic distance.Moreover, DENV-4 with the highest genetic distance appeared to be the most genetically distinct serotype among others as shown in Fig 3.

Figure 3 .
Figure 3. Evolutionary analysis of Dengue virus serotypes (DENV-1, DENV-2, DENV-3 and DENV-4) based on envelope gene sequences.DENV-1 and DENV-3 share a common clade, indicating a closer evolutionary connection.The tree was constructed using Maximum Likelihood with Tamura-Nei model, and the displayed tree represents the highest log likelihood (-9153.31).Branch lengths indicate substitutions per site.

Fig 4 .
Fig 4. Nonsynonymous and synonymous mutations in codons no.7 affecting amino acid composition.The number of non-synonymous mutations are more as compared to synonymous mutations.

Fig 5 .Fig 6
Fig 5. Dense Rate Plot: α; rate of synonymous substitutions at a site, β; rate of nonsynonymous substitutions at a site, substitutions in codons at a p-value threshold of ≤0.1.Whereas, X-axis represent the position of the substitution sites within the gene sequence and Y-axis represent the rate of synonymous and non-synonymous substitutions at a site.The pvalue indicates statistical significance of these observations.

Fig 7 :Fig 8 .
Fig 7: Recombination breakpoints detected in GARD analysis.The X-axis represents the genomic coordinates spanning the range of 0 to 1800, while the Y-axis shows the model average support values ranging from 0.0 to 1.0.Two distinct sharp peaks indicate recombination breakpoints identified at positions 383 and 1069.

Fig 9 .
Fig 9. Phylogenetic tree showing substitutions per site for E-gene of DENV serotypes.

Fig 11 .
Fig 11.(a) Cumulative distribution of LRT and Site LR.Y-axis representing the cumulative LRT scores and X-axis showing site LRT contribution .(b) Site Count vs. Site LR.The Y-axis represents the count of sites and site LRT values are shown on the X-axis.There were 25 sites with ER ≥ 10 for positive selection.The downward and upward sloping curve in the graph signifies the cumulative sum of LRT values when we consider more and more sites within the gene.The steep rise in the curve from left to right suggests a cluster of sites with a less negative and more positive selection sites.Fig Contrasting with the aforementioned graph, which showed a cumulative distribution of LRT values illustrating the overall pattern of positive selection, the Fig delves into the diversity of selective pressure acting on different sites providing a more detailed view of the individual site specific LR values.Moreover, The coexistence of peaks in both the negative and positive LRT value ranges indicated heterogeneity in the data.

Figure 12 .
Figure 12.(a) LRT analysis comparing three nucleotide substitution models (3H, 2H and 1H) for site substitutions (b) Evidence Ratio of Site specific nucleotide substitution in double hit vs. single hit and triple hit vs. double hit substitutions analysis (c) Site Log-Likelihood analysis representing site log likelihood on Y-axis and sites on X-axis.The models include MG94 with Double and Triple instantaneous substitutions.MG94 with Double instantaneous substitutions, Standard MG94.Variations in log likelihood values suggest heterogeneous substitution patterns across the gene.

1 to 4
sharing over 65% of their genomes.Announcement of a 5th DENV serotype in 2013, has raised a concern in public health departments[3].To date, no evidence has been found regarding the emergence of DENV-5 in Pakistan.It also raises a speculation that there might be new serotypes which remain undetectable.New serotypes may evolve as a result of genetic bottleneck, natural selection and genetic recombination.Despite being a region with a high occurrence of dengue infections, our understanding of dengue virus evolution has been limited due to the insufficient availability of dengue genomic data from Pakistan.Previous studies on dengue in Pakistan have been primarily focused on localized outbreaks which typically involve single serotype or genotype specific studies from same or closely related strains.The phylogenetic analysis has revealed a close genetic relationship between DENV-2, the most prevalent and DENV-4, the latest known serotype to have emerged in Pakistan.Despite their distinct serotypes and shared clade, these have significant implications for our understanding of DENV dynamics in the region.The clustering of these two serotypes within the same clade suggests a common evolutionary history and genetic similarity potentially indicating shared ecological niches or environmental conditions conducive to their co-circulation.In Pakistan, current mosquito interventions have proven ineffective in preventing dengue, despite the country being heavily impacted by the disease and contributing significantly to the global dengue burden.
heterogeneity was observed across different sites within the E gene indicating some regions under strong purifying selection to maintain function while others may be under positive selection to adapt to changing environments.However, the study has certain limitations, such as the absence of DENV serotypes sequencing data from Pakistan collected in 2023 and the limited availability of complete genome DENV-4 sequence from Pakistan.Having a more extensive dataset of complete genome sequences would have provided a more comprehensive understanding of the situation regarding dengue virus in Pakistan.Evolutionary analysis of the E gene and complete genome of DENV serotypes consistently revealed a close evolutionary relationship between DENV-1 and DENV-3, forming a distinct clade.DENV-2 and DENV-4 are more genetically divergent from the other two serotypes with DENV-4 being the most genetically distinct.This newfound understanding of genetic relationships can be utilized to develop predictive models for DENV outbreaks in Pakistan encompassing both genetic and environmental factors thus optimizing preparedness and response strategies.Models can also be developed to predict new evolving serotypes from genomic data of available serotypes.
model was also utilized under ML estimate of substitutions matrix to estimate substitution probabilities (r) between nucleotide bases.Where,  = / s determine the number of transitional and v determine the number of transversional substitutions per site.The probabilities were categorized into transitional and transversional substitutions (data available in supplementary file) to ensure relative proportions the sum of substitution probabilities equaled 100% and the determined frequency of nucleotides were A = 32.48%,T/U = 21.19%,C = 20.61%, and G = 25.71%.The Transition/Transversion bias (R) was assessed using the Kimura (1980) 2-parameter model.The model was applied to estimate the probability of substitution (r) between different nucleotide bases differentiating between transitional and transversional substitutions.

Table 2 . Selection Pressure analysis tests results and their comparison. The codons confirmed by at least two Tests are shown as Bold numbers.
Among RNA viruses, recombination events exhibit varying frequencies in positive sense ssRNA viruses where certain families such as Picornaviridae display elevated rates while others like Flaviviridae experience sporadic occurrences or in case of Leviviridae, virtually none at all.