ABSTRACT
As the COVID-19 pandemic continues to affect the world, a new variant of concern, B.1.1.529 (Omicron), has been recently identified by the World Health Organization. At the time of writing, there are still no available primer sets specific to the Omicron variant, and its identification is only possible by using multiple targets, checking for specific failures, amplifying the suspect samples, and sequencing the results. This procedure is considerably time-consuming, in a situation where time might be of the essence. In this paper we use an Artificial Intelligence (AI) technique to identify a candidate primer set for the Omicron variant. The technique, based on Evolutionary Algorithms (EAs), has been already exploited in the recent past to develop primers for the B.1.1.7/Alpha variant, that have later been successfully tested in the lab. Starting from available virus samples, the technique explores the space of all possible subsequences of viral RNA, evaluating them as candidate primers. The criteria used to establish the suitability of a sequence as primer includes its frequency of appearance in samples labeled as Omicron, its absence from samples labeled as other variants, a specific range of melting temperature, and its CG content. The resulting primer set has been validated in silico and proves successful in preliminary laboratory tests. Thus, these results prove further that our technique could be established as a working template for a quick response to the appearance of new SARS-CoV-2 variants.
Introduction
On November 26th, 2021, the World Health Organization (WHO) declared a fifth variant of concern (VOC)1, the Omicron variant, belonging to the PANGO lineage as B.1.1.5292, and clade GR/484A in the Global Initiative on Sharing Avian Influenza Data (GISAID)3. The Omicron variant is characterized by the mutations4,5 presented in Table 1.
B.1.1.529 was first discovered in Botswana on November 11th, 2021. It was then quickly identified in South Africa three days later and identified in two cases in Hong Kong4. B.1.1.529 is characterized by at least 30 amino acid substitutions, three small deletions and one small insertion in the viral RNA encoding for the spike protein (S), making it the most divergent variant detected to date4. Importantly, half of the amino acid substitutions in B.1.1.529 are present on its receptor-binding domain (RBD), playing a key role in ACE2 binding and antibody recognition. Several mutations in this domain have been characterized in other variants, including VOCs6,7 and modelled associating them with increased binding affinity to ACE2 and higher rates of transmission (N501Y, Q498R, H655Y, N679K, P681H)8–11, increase in the ability of immune escape from neutralising antibodies (K417N, N501Y, N440K, S477N, T478K, G339D)11–14, reduction in vaccine effectiveness and increased risk of reinfections (E484, K417N, N501Y)13–17. The Omicron Variant has been divided into 3 sublineages: BA.1, BA.2 and BA.3, where as of December 15th, 2021 BA.1 accounts for 95% of available Omicron sequences in GISAID. BA.2 does not present the 69-70 deletion in the S gene, while BA.1 and BA.3 do18.
Since its discovery, B.1.1.529 has been rapidly spreading across the globe and through communities pointing to a higher degree of transmission and possible growth advantage when compared to other variants such as Delta (B.1.617.2)15,19. Although a great deal remain to be uncovered regarding clinical presentation and epidemiology, the majority of cases reported to authorities are either asymptomatic at the time of testing or have mild symptoms (including cough, fatigue, and congestion or runny nose)4,20,21: preliminary data thus suggest a lower disease severity22,23. Since its detection, B.1.1.529 has started to replace the Delta variant in South Africa24–26, and early evidence has linked it to an increased risk of reinfection15–17,27. Due to its highly mutated S protein, B.1.1.529 has been shown to escape the majority of SARS-CoV-2 neutralizing antibodies and retrovirals12,13,28 (except for ensovibep29, remdesivir, molnupiravir and nirmatrelvir30,31). Furthermore, current vaccines technologies tested against B.1.1.529 seem to show lower efficacy13,23,26,28,32,33, however vaccine boosters (including heterologous booster) have been shown to help reducing immune escape31,34–37, and universal vaccines to help fight this variant have been proposed38,39. It is important to note that early data point to a reduced risk of hospital admission among Omicron infected individuals23,27, and a reduced risk of severe outcomes among Omicron re-infected individuals earlier infected by the Delta variant25. Interestingly, a recent study found an increase in Delta variant neutralization from individuals infected with Omicron40, which could result in a decreased ability of reinfection with future and current variants.
Due to the high concern of this variant, one of the best strategies to contain its spread is by properly measuring it. The current identification of variants is usually performed through PCR tests. As we recently showed in41, on barriers influencing vaccine development timelines in Covid-19 Vaccine R&D, fast and correct identification is of highest impact on Covid-19 vaccine development. In other words, the steepest barrier to innovation is “lack of knowledge concerning the pathogen target”. Consequently, AI fast prediction of suitable primers relates directly to outbreak responses, as also recently discussed in41,42. Differentiating between Omicron and the other variants, however, is a difficult task. In practice, the main way of detecting the Omicron variant is by running three targets, and then checking the so-called S gene dropout or S gene target failure1: a failure of one specific test that targets the heavily mutated S gene. The virus is then amplified and sequenced, using a procedure similar to the one followed for identifying the B.1.1.7/Alpha variant43. Depending on the technology adopted and the laboratory conditions in each country, the whole process can take days. Considering the amount of the people that need to be tested in order to properly assess the spread of the variant, this creates the necessity for a faster and reliable test. In addition, if 69-70 deletion is targeted in the S gene, the BA.2 sublineage does not have this mutation. Here we present the design of a primer set for the specific detection of the Omicron variant using the fast and completely automated pipeline built around deep learning and Evolutionary Algorithms (EA) techniques44.
Results and Discussion
Using the methods presented in44, we find 10 different sequences from the 10 runs of the EA technique. Next, we run an analysis to find if any of the sequences presents more than one mutation. The resulting sequence GACCCACTTATGGT-GTTGGTCpresents 3 characteristics mutations to Omicron variant1: Q498R (A23055G), N501Y (A23063T) and Y505H (T23075C), position 23,054 to 23,075 in the reference accession NC_045512.245. Then, we simulated sequence GACC-CACTTATGGTGTTGGTCas a forward primer in Primer3Plus46 using the accession EPI_ISL_6590782, resulting in a warning for High end self complementarity. To solve the issue, we increased the size of the primer by adding a bp at the end (GACCCACTTATGGTGTTGGTCA), which resulted in an acceptable primer candidate with a Tm of 62.0 °C. We then generated the internal probe CACCAGCAACTGTTTGTGGAand reverse primer CTGCCAAATTGTTGGAAAGGwith a Tm of 60.8 °C and 60.5 °C respectively with a product size of 208 bp.
The frequency of appearance of the selected sequences is shown in Table 2. Out of 123 available Omicron sequences, 112 presented the candidate forward primer produced by our technique. From the 11 that do not present the sequence GACCCACTTATGGTGTTGGTCA, 9 had sequencing errors and 2 (1.63%) do not possess mutation Y505H. Further validation was performed on the different Omicron sub-lineages (BA.1, BA.2, BA.3) to verify that the primers work, Table 3. An analysis using 659 sequences, marked as complete and low coverage excluded in GISAID from BA.1 sub-lineage show 88.01% success of the forward primer and one nucleotide change in 2.28% of the sequences, while the rest have sequencing errors in the target.
Laboratory testing of the generated primers, as part of the UniCoV study proves to be successful (see Table 4) in identifying the Omicron variant. Thus, these results prove further that the use of AI to generate specific diagnostic tests could be used as a fast measure to tackle the on-going SARS-CoV-2 pandemic and the appearance of new variants.
Methods and Data
Data
From the Global Initiative on Sharing Avian Influenza Data (GISAID) repository3, we downloaded 123 sequences identified as B.1.1.529/Omicron and 100 sequences of each of the following variants labeled following the Pango lineage2, for a total of 2,100 sequences: AY.3 (Delta Sublineage), B.1, B.1.1.7 (Alpha), B.1.1.214, B.1.1.519, B.1.2, B.1.160, B.1.177, B.1.177.21, B.1.221, B.1.243, B.1.258, B.1.351 (Beta), B.1.427, B.1.429, B.1.526, B.1.596, B.1.617.2 (Delta), D.2, P.1 (Gamma), and R.1 in *.fasta format. Then we kept 23 sequences of B.1.1.529 as test set, while the other 100 were added to the 2,100 sequences of other variants to be used as a training set. The classification problem was defined as binary classification, with B.1.1.529 samples labeled with ‘1’, and the rest labeled with ‘0’. For further validation, we downloaded 140,874 sequences labeled as BA.1, 701 BA.2 and 19 BA.3 from the GISAID repository on January 6th, 2022, to verify whether our designed primer set works for the different Omicron sublineages.
Methods
Creating the Primer Set
To create the primer set, we use a technique based on Evolutionary Algorithms (EAs), that is able to find suitable sequences to identify the Omicron variant44. In summary, this method optimizes two integer values p, k that jointly identify the best candidate forward primer of length 21 at position p in sample k of the training set, see Figure 1. Then gives a cost based on its frequency of appearance marked with label ‘1’, its absence from samples with label ‘0’ (in other words, its specificity to the target variant), its CG content, and melting temperature, as described in more detail in47, where the same technique was used to generate primers for B.1.1.7 (Alpha) variant successfully.
As EAs are stochastic algorithms, and can thus potentially return different candidates at each run, we run the algorithm 10 times to obtain a variety of candidate sequences. After the 10 repetitions, we check the obtained sequences for the number of mutations. Once a sequence containing more than one mutation is identified, we simulate it in Primer3Plus46 using the accession EPI_ISL_6590782 as a reference for Omicron variant.
Laboratory Testing
Raw Saliva samples, approx. 1-2 ml, were collected in 5 ml screwcap containers from volunteers in the UniCoV study48. Following collections samples were heat inactivated at 95°C for 5 min, cooled, vortexed and then 20 ml of Saliva was added to 20 ml of Saliva Ready™ Solution in a 0.2 ml 96 well plate. The plate was vortexed and centrifuged and then heated at 62°C for 5 min, 92°C for 5 min and then cooled at 4°C. Saliva was then screened for SARS-CoV-2 (ORF1a, ORF1b and N gene) and Human RNase P using the TaqMan™ 1-Step Mutiplex SARS-CoV-2 Fast PCR Kit 2.0 on the Applied Biosystems™ QuantStudio™ 5 Real Time PCR Instrument, 96 well, 0.2-mL block (Thermo Fisher Scientific) according to the manufacturer’s instructions including a Positive control. Reverse transcription 53°C for 5 min, 1 Preincubation 85°C for 5 min., Activation at 95°C 2 minutes with 40 cycles of Denaturation 95°C for 1 second and Anneal / extension 62°C for 30 seconds.
Positive SARS-CoV-2 samples were subsequently screened for the presence of Omicron variant using our specific Omicron Primer Set. Briefly cDNA was synthesised using saliva from the Saliva Ready™ step above with LunaScript RT-Supermix (NEB), briefly for primer annealing for 2 min at 25°C, cDNA synthesis for 10 minutes 55 °C 2 minutes and denaturation 95°C for 1 minute. qPCR using followed by second and Anneal / extension 62°C for 30 seconds.
qPCR was performed using cDNA from above with the Luna Universal Probe qPCR mix (NEB), with Omicron specific primers and N1 (2019-nCoV RUO) primers/probes (Integrated DNA Technologies (IDT)) at a concentration 500 nM with probes at 250 nM(FAM-labelled). PCR conditions were denaturation at 95°C fro 1 minutes and 40 cycles of Denaturation 95°C for 15 second and Anneal / extension 60°C for 30 seconds on the Applied Biosystems™ QuantStudio™ 5 Real Time PCR Instrument, 96 well, 0.2-mL block. All PCR reactions were performed in duplicate with a technical replicate performed following initial analysis.
The used data for this manuscript, and the resulting frequency of appearance for each sequence is available in: https://github.com/steppenwolf0/omicronVariant.git
Author contributions statement
CAP, LMM, made the biological analysis, and primer design. ALR and AT made the programming, data collection, and experiments in silico. EC, ADK and JG made the experiment and study design. JMS and JS performed the wet lab validation on these primer sequences (from samples obtained as art of the UnCoV University community screening study in UCC, Cork, Ireland). CAP and ALR wrote the the article, all authors contributed to editing of the article.
Additional information
J. Garssen is a part time employee at Danone Nutricia Research, Utrecht, the Netherlands.
Footnotes
↵* a.lopezrincon{at}uu.nl