Diversity of SARS-CoV-2 genome among various strains identified in Lucknow, Uttar Pradesh

Biswajit Sahoo; Pramod Kumar Maurya; Ratnesh Kumar Tripathi; Jyotsana Agarwal; Swasti Tiwari

doi:10.1101/2021.10.05.463185

Abstract

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has emerged as a significant challenge worldwide. Rapid genome sequencing of SARS-CoV-2 is going on across the globe to detect mutations and genomic modifications in SARS-CoV-2. In this study, we have sequenced twenty-three SARS-CoV-2 positive samples collected during the first pandemic from the state of Uttar Pradesh, India. We observed a range of already reported mutations (2-22), including; D614G, L452R, Q613H, Q677H, T1027I in the S gene; S194L in the N gene; Q57H, L106F, T175I in the ORF3. Few unreported mutations such as P309S in the ORF1ab gene; T379I in the N gene; and L52F, V77I in the ORF3a gene were also detected. Phylogenetic genome analysis showed similarity with other SARS-CoV-2 viruses reported from Uttar Pradesh. The observed mutations may be associated with SARS-CoV-2 virus pathogenicity or disease severity.

Introduction

In December 2019, Wuhan, Hubei province, China first reported numerous pneumonia-like cases with unidentified etiology. Later it was identified as a novel coronavirus (COVID-19)[1]. According to WHO, as of now 13 August 2021, a total of 205,338,159 confirmed COVID-19 cases, including 4,333,094 deaths have been reported worldwide and 32,117,826 Confirmed Cases and 430,254 deaths in India. The state of Uttar Pradesh witnessed 17, 08,876 confirmed cases and 22,782 deaths according to the Government of India. The first strain of Wuhan-Hu-1 coronavirus was isolated and the sequenced complete genome was 29.9 kb[2]. Also, other types of coronavirus including SARS-CoV and MERS-CoV have been identified previously, which infect humans which are positive-sense RNA genomes with 27.9 kb and 30.1 kb, respectively[3]. To understand the genetic variants of SARS-CoV-2, genome sequencing is an essential tool to track cases and determining microbial provenance[4], [5]. The first SARS-CoV-2 genome sequence was publicly available on January 10, 2020 (GenBank ID: MN908947.3)[2]. Since then multiple sequences have been submitted to publicly available databases such as GeneBank and GISAID globally. Study of this extensive genomic sequencing to identify the mutations which can increase transmissibility and virulence of the virus[6], [7]. SARS-CoV-2 is an enveloped positive-sense single-stranded RNA virus that has multiple genes which code for different proteins such as open reading frames, including ORF1a, ORF1b ORF3a, ORF6, ORF7a, ORF7b, ORF8, and ORF10; S gene (Surface glycoprotein), N gene (nucleocapsid phosphoproteins), M gene (membrane glycoprotein) and E gene (envelope protein)[8]. Viral genome of SARS-CoV-2, mainly placed in the first ORF (ORF1a/b) and translated into two replicase polyproteins (pp1a and pp1ab), 16 non-structural proteins (NSP), and RNA-dependent RNA polymerase which is important for replication and survival in the host[9]. The remaining ORFs such as ORF3a, ORF7, and ORF8 genes code for accessory and structural proteins, however, their role is not yet fully understood. ORF3a in SARS-CoV has played a significant role in viral release to the host[10], [11]. The exact role of ORF7 and ORF8 is still unknown but several reports suggested that they are involving in viral replication and have immune responses[12], [13]. Open-source databases such as the GenBank-NCBI and GISAID have a huge number of SARS-CoV-2 genomic sequencing data to identify the mutation, SNPs (single nucleotide polymorphism) in SARS-CoV-2. SNPs in the SARS-CoV-2 genome lead to the missense variant such as P323L, P4715L in ORF1ab, D614G, N439K in S gene, R203K, R202K, and G204R in N gene which is commonly reported. P323L, P4715L mutation in ORF1ab has played an important role in regulating RNA-dependent RNA polymerase[14]. D614G, N439K mutation in S gene has associated with increase infectivity[15]. R203K, R202K, and G204R mutations in the N gene are linked with viral survival and replication in the host[16]. Knowing the variants which could escape the immunity, by genome sequencing is essential to stop the coronavirus across the globe. In this study, we sequenced the SARS-CoV-2 genome from 24 SARS-CoV-2 positive samples received at Dr. Ram Manohar Lohia Institute of Medical Sciences, Lucknow during the first peak during the pandemic in the state of Uttar Pradesh, India. Various mutations were observed in the genomic sequences including previously reported mutations as well as novel mutations.

Materials and Methods

Collection of Covid-19 positive RNA Samples

RNA from twenty-three Covid-19 positive samples was obtained from Dr. Ram Manohar Lohia Institute of Medical Sciences, Lucknow. The presence of SARS-CoV-2 was detected by COVID-19 RT-qPCR kit ((Labgun, lab, Genomics. co. Ltd, Republic of Korea). Cq values between 18-35 were taken for sequencing. This study protocol was approved by Institutional Human Ethics Committee SGPGIMS, Lucknow (Ref N. 111 PGI/BE/327/2020)

Whole-genome sequencing

For sequencing, libraries were constructed using a ligation kit (SQK-LSK109) as described in PCR-tiling of COVID-19 virus protocol (PTC_9096_v109_revF_06Feb2020; Oxford Nanopore Technologies). Briefly, 23 SARS-CoV-2 positive RNA samples were isolated from swabs positive for the presence of SARS-CoV-2 in RT-qPCR assay (quantification cycle (Cq) values18-31;(Table 1) and were converted into complementary DNA (cDNA). Then the cDNA products were amplified using the primer pools spanning the SARS-CoV-2 whole genome sequence (i.e., 400-bp Artic nCoV-2019 V3 panel (https://github.com/artic-network/artic-ncov2019) purchased from Integrated DNA Technologies according to the manufacturer’s instructions. DNA library preparation (SQK-LSK-109, Oxford Nanopore Technologies, United Kingdom), purification using AMPure XP magnetic beads (Beckman Coulter), adaptor ligation, and barcoding EXP-NBD104 (barcodes 1-12) or EXP-NBD114 (barcodes 13-24) kits (Oxford Nanopore Technologies, United Kingdom) were done as per the manufacturer’s instructions. DNA libraries were pooled and loaded on the R9.4.1 flow cell (FLO-MIN106, Oxford Nanopore Technologies, United Kingdom). The sequencing was performed using a MinION Mk-1b device (Oxford Nanopore Technologies).

View this table:

Table 1:

Cq values of the isolated samples for SARS-CoV-2 detection.

Genome Assembly, Alignment, and Phylogenetic Analysis

Nanopore sequencing data were base called and de-multiplexed using Guppy v.3.4.4. Variant analysis was performed using Artic analysis pipeline v.1.1.3. (https://artic.network/ncov-2019/ncov2019-bioinformatics-sop.html) using recommended settings. Minimum and maximum read lengths in the Artic guppyplex filter were set to 400 and 700 for the 400-bp amplicons. Wuhan Hu-1 (GenBank ID: MN908947.3) used as reference genome.

MAFFT was used to align the whole sequences of the 23 genomes to the SARS-CoV-2 reference genome (MN908947.3)[17], [18]. SnpEff v4.3 was used to identify the SNPs and changes in the amino acid produced by the gene in the genome[19].

Phylogenetic analysis was done using Nextstrain and aligned using MAFFT v7.471 to assess previously reported genomes. Maximum likelihood trees were generated and identified clade as well as lineages to classify the identified variants using Nextclade [20].

Results and Discussion

Twenty-three RT-PCR SARS-CoV-2 positive samples from different periods of the pandemic in Uttar Pradesh were selected for this study. Genome coverage was obtained in between 1000X to 8500X for all the samples at the end of sequencing. Next consensus genomes were obtained after assembling with the reference genome. All the sequenced genomes were submitted to GISAID [Table 2].

View this table:

Table 2:

Genome similarity of twenty four sequenced genomes when compared to NCBI reference genome (MN908947.3) along with total reads generated and genome coverage obtained.

The twenty-three viral SARS-CoV-2 genome assemblies were aligned to the reference genome (SARS-CoV-2 Wuhan-Hu-1 MN908947.3) using MAFFT and genome similarities to the reference genome were calculated (Table 2). All genomes showed more than 95% similarity. Individual genomes were aligned to the NCBI reference genome to predict the mutations in the genome. Using the SnpEff v4.3 tool, various synonymous and missense mutations were detected. A range of 8 to 22 mutations was detected in major genes in twenty-three samples (Table 3).

View this table:

Table 3:

Mutations detected and their respective amino acid changes when compared to NCBI reference genome (SARS-CoV-2, Wuhan Hu-1, MN908947.3) (* marks indicate the presence of mutation)

Non-synonymous mutations were detected in ORF1ab, S, N, and ORF3a genes (Table 3). Mutations in the S gene such as D614G, L452R, Q613H, Q677H, T1027I have been identified as previously reported. Non-synonymous mutation D614G has been identified in most of the genome sequences, as reported widely in the literature that increases infectivity by adding more functional S protein into the virion causing more severity[21]. Other identified mutations are also linked with increase viral infectivity. Also several previously reported mutations like S194L in the N gene; Q57H, L106F, T175I in the ORF3a gene were observed. L52F, V77I mutations in ORF3a were also identified in one and three cases respectively.

A few possible novel mutations were also observed in ORF1ab, N, and ORF3a (Table 3). P309S mutation in the ORF1ab gene was identified in four samples. T379I mutation in the N gene was observed in one case. These mutations in the genome indicate the presence of multiple variations of the virus in Uttar Pradesh.

141 SARS-CoV-2 genome sequences from Lucknow, Uttar Pradesh were downloaded from GISAID to construct the phylogenetic tree with our 23 sequenced SARS-CoV-2 genome sequences. The sequenced 23 SARS-CoV-2 genome sequences were found in clade 20 A and 20B, out of which 16 variants were found related to clade 20 A, and the remaining 6 variants were found related to clade 20 B (Figure 1). Out of 23 genome sequences, 6 variants were identified as B.1 lineage, 8 variants as of B.1.36 lineage, 6 variants as of B.1.1.216 lineage and one variant was of B.1.456 lineage while the remaining two variants have not shown an association with any of lineages.

Figure 1:

Phylogenetic analysis of 164 SARS-CoV-2 genome sequences: including 23 variants of present study and 141 variants from different regions of Lucknow, Uttar Pradesh, were retrieved from GISAID database. Nextclade was used for phylogentic analysis and nextstain nomenclature of all variants as shown in figure.

Conclusion

We sequenced twenty-three SARS-CoV-2 genomes from the positive clinical samples collected during the first wave from Uttar Pradesh, India. Several identified mutations were already reported, while a few of the identified mutations could be novel. Most of the samples had D614G non-synonymous mutation. Phylogenetic analysis of the isolated viral genomes showed high similarities with the previously isolated SARS-CoV-2 genomes from Uttar Pradesh. Future studies can warranted to understand if these mutations potentially influence host susceptibility, pathogenicity, and virulence.

Acknowledgment

The study was supported by intramural grants (A-24-PGI/IMP/81/2020) and overhead funds to ST. The authors wish to thank Dr. Suman Misra and Dr. Arvind Kumar (Department of Molecular Medicine, SGPGIMS, Lucknow) for their technical help.

References

1.↵
J. Zheng, “SARS-CoV-2: an Emerging Coronavirus that Causes a Global Threat,” Int. J. Biol. Sci., vol. 16, no. 10, pp. 1678–1685, 2020, doi: 10.7150/ijbs.45053.
OpenUrl CrossRef
2.↵
F. Wu et al., “A new coronavirus associated with human respiratory disease in China,” Nature, vol. 579, no. 7798, pp. 265–269, Mar. 2020, doi: 10.1038/s41586-020-2008-3.
OpenUrl CrossRef PubMed
3.↵
E. de Wit, N. van Doremalen, D. Falzarano, and V. J. Munster, “SARS and MERS: recent insights into emerging coronaviruses,” Nat. Rev. Microbiol., vol. 14, no. 8, pp. 523–534, Aug. 2016, doi: 10.1038/nrmicro.2016.81.
OpenUrl CrossRef PubMed
4.↵
F. P. Esper et al., “Genomic Epidemiology of SARS-CoV-2 Infection During the Initial Pandemic Wave and Association With Disease Severity,” JAMA Netw. Open, vol. 4, no. 4, p. e217746, Apr. 2021, doi: 10.1001/jamanetworkopen.2021.7746.
OpenUrl CrossRef
5.↵
L. J. R. van Elden et al., “Frequent detection of human coronaviruses in clinical specimens from patients with respiratory tract infection by use of a novel real-time reverse-transcriptase polymerase chain reaction,” J. Infect. Dis., vol. 189, no. 4, pp. 652–657, Feb. 2004, doi: 10.1086/381207.
OpenUrl CrossRef PubMed Web of Science
6.↵
S. W. Long et al., “Sequence Analysis of 20,453 Severe Acute Respiratory Syndrome Coronavirus 2 Genomes from the Houston Metropolitan Area Identifies the Emergence and Widespread Distribution of Multiple Isolates of All Major Variants of Concern,” Am. J. Pathol., vol. 191, no. 6, pp. 983–992, Jun. 2021, doi: 10.1016/j.ajpath.2021.03.004.
OpenUrl CrossRef PubMed
7.↵
L. van Dorp et al., “Emergence of genomic diversity and recurrent mutations in SARS-CoV-2,” Infect. Genet. Evol. J. Mol. Epidemiol. Evol. Genet. Infect. Dis., vol. 83, p. 104351, Sep. 2020, doi: 10.1016/j.meegid.2020.104351.
OpenUrl CrossRef PubMed
8.↵
R. A. Khailany, M. Safdar, and M. Ozaslan, “Genomic characterization of a novel SARS-CoV-2,” Gene Rep., vol. 19, p. 100682, Jun. 2020, doi: 10.1016/j.genrep.2020.100682.
OpenUrl CrossRef PubMed
9.↵
J. Cui, F. Li, and Z.-L. Shi, “Origin and evolution of pathogenic coronaviruses,” Nat. Rev. Microbiol., vol. 17, no. 3, pp. 181–192, Mar. 2019, doi: 10.1038/s41579-018-0118-9.
OpenUrl CrossRef PubMed
10.↵
N. S. Zhong et al., “Epidemiology and cause of severe acute respiratory syndrome (SARS) in Guangdong, People’s Republic of China, in February, 2003,” Lancet Lond. Engl., vol. 362, no. 9393, pp. 1353–1358, Oct. 2003, doi: 10.1016/s0140-6736(03)14630-2.
OpenUrl CrossRef
11.↵
W. Lu, K. Xu, and B. Sun, “SARS Accessory Proteins ORF3a and 9b and Their Functional Analysis,” Mol. Biol. SARS-Coronavirus, pp. 167–175, Jul. 2009, doi: 10.1007/978-3-642-03683-5_11.
OpenUrl CrossRef
12.↵
T. G. Flower, C. Z. Buffalo, R. M. Hooy, M. Allaire, X. Ren, and J. H. Hurley, “Structure of SARS-CoV-2 ORF8, a rapidly evolving immune evasion protein,” Proc. Natl. Acad. Sci. U. S. A., vol. 118, no. 2, p. e2021785118, Jan. 2021, doi: 10.1073/pnas.2021785118.
OpenUrl CrossRef
13.↵
D. X. Liu, T. S. Fung, K. K.-L. Chong, A. Shukla, and R. Hilgenfeld, “Accessory proteins of SARS-CoV and other coronaviruses,” Antiviral Res., vol. 109, pp. 97–109, Sep. 2014, doi: 10.1016/j.antiviral.2014.06.013.
OpenUrl CrossRef PubMed
14.↵
M. Pachetti et al., “Emerging SARS-CoV-2 mutation hot spots include a novel RNA-dependent-RNA polymerase variant,” J. Transl. Med., vol. 18, no. 1, p. 179, Apr. 2020, doi: 10.1186/s12967-020-02344-6.
OpenUrl CrossRef PubMed
15.↵
J. Chen, R. Wang, M. Wang, and G.-W. Wei, “Mutations Strengthened SARS-CoV-2 Infectivity,” J. Mol. Biol., vol. 432, no. 19, pp. 5212–5226, Sep. 2020, doi: 10.1016/j.jmb.2020.07.009.
OpenUrl CrossRef PubMed
16.↵
T. Tomaszewski et al., “New Pathways of Mutational Change in SARS-CoV-2 Proteomes Involve Regions of Intrinsic Disorder Important for Virus Replication and Release,” Evol. Bioinforma. Online, vol. 16, p. 1176934320965149, 2020, doi: 10.1177/1176934320965149.
OpenUrl CrossRef
17.↵
K. Katoh, G. Asimenos, and H. Toh, “Multiple alignment of DNA sequences with MAFFT,” Methods Mol. Biol. Clifton NJ, vol. 537, pp. 39–64, 2009, doi: 10.1007/978-1-59745-251-9_3.
OpenUrl CrossRef PubMed Web of Science
18.↵
K. Katoh and D. M. Standley, “MAFFT multiple sequence alignment software version 7: improvements in performance and usability,” Mol. Biol. Evol., vol. 30, no. 4, pp. 772–780, Apr. 2013, doi: 10.1093/molbev/mst010.
OpenUrl CrossRef PubMed Web of Science
19.↵
P. Cingolani et al., “A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3,” Fly (Austin), vol. 6, no. 2, pp. 80–92, Jun. 2012, doi: 10.4161/fly.19695.
OpenUrl CrossRef PubMed Web of Science
20.↵
J. Hadfield et al., “Nextstrain: real-time tracking of pathogen evolution,” Bioinforma. Oxf. Engl., vol. 34, no. 23, pp. 4121–4123, Dec. 2018, doi: 10.1093/bioinformatics/bty407.
OpenUrl CrossRef PubMed
21.↵
L. Zhang et al., “SARS-CoV-2 spike-protein D614G mutation increases virion spike density and infectivity,” Nat. Commun., vol. 11, no. 1, p. 6013, Nov. 2020, doi: 10.1038/s41467-020-19808-4.
OpenUrl CrossRef PubMed