Abstract
Developing intranasal vaccines against pandemics and devastating airborne infectious diseases is imperative. The superiority of intranasal vaccines over injectable systemic vaccines is evident, but the challenge in developing effective intranasal vaccines is more substantial. Fusing a protein antigen with the catalytic domain of cholera toxin (CTA1) and the two-domain D of staphylococcal protein A (DD) has significant potential for intranasal vaccines. In the present study, we constructed two fusion proteins containing CTA1, tandem repeat linear epitopes of the SARS-CoV-2 spike protein (S14P5 or S21P2), and DD. The in silico characteristics and solubility of the fusion proteins CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD were analyzed when overexpressed in Escherichia coli. Structural predictions indicated that each component of the fusion proteins was compatible with its origin. Both fusion proteins were predicted by computational tools to be soluble when overexpressed in E. coli. Contrary to these predictions, the constructs exhibited limited solubility. The solubility did not improve even after lowering the cultivation temperature from 37°C to 18°C. Induction with IPTG at the early log phase, instead of the usual mid-log phase growth, significantly increased soluble CTA1-(S21P2)4-DD but not CTA1-(S14P5)4-DD. The solubility of overexpressed fusion proteins significantly increased when a non-denaturing detergent (Nonidet P40, Triton X100, or Tween 20) was added to the extraction buffer. In a scale-up purification experiment, the yields were low, only 1-2 mg/L of culture, due to substantial losses during the purification stages, indicating the need for further optimization of the purification process.
Introduction
The respiratory tract is constantly exposed to the external environment and serves as a primary route for the entry of airborne pathogens, including those causing devastating diseases such as COVID-19, SARS, MERS, influenza, and tuberculosis. It is advantageous for these pathogens to be eliminated by the immune system in the nasal cavity before they enter the lungs or the rest of the body. Eliciting mucosal immune responses in the nasal cavity through intranasal vaccination is a promising approach, as the nasal cavity is equipped with an advanced local lymphoid system, specifically the nasopharynx-associated lymphoid tissue (NALT) [1]. Intranasal vaccines offer several advantages over traditional injectable vaccines, including the ability to induce systemic and mucosal immunity, which is particularly effective for pathogens entering through mucosal surfaces. Additionally, intranasal administration is needle-free, enhancing patient compliance and reducing the risk of needle-associated infections and injuries [2].
The most significant progress in developing intranasal vaccines has occurred for influenza. Several intranasal vaccines, such as FluMist or Fluenz (AstraZeneca) and Nasovac (Serum Institute of India), have been approved for human use. These intranasal vaccines have demonstrated superiority over systemic vaccines, eliciting robust immune responses, including mucosal-neutralizing antibodies and systemic protection against homologous and heterologous viruses, without additional adjuvants [2].
Possibly inspired by the success of the intranasal influenza vaccine, there is significant interest in developing intranasal vaccines for COVID-19. Various types of intranasal vaccines are being developed, including recombinant Sendai virus expressing the receptor-binding domain of SARS-CoV-2 [3], VLP-based vaccines paired with adjuvants [4], and DelNS1-nCoV-RBD live attenuated influenza vaccine [5]. These vaccines have shown efficacy in preclinical and clinical studies, with some at the stage of clinical trials demonstrating safety, immunogenicity, and potential for broader protection against emerging variants like Omicron. Intranasal vaccination presents a promising approach to combat COVID-19.
Most of the aforementioned intranasal vaccines are attenuated or adenovirus-vectored vaccines. While these vaccines induce strong immunity, they require cold chains and often have unacceptable side effects [6]. Subunit vaccines, especially those based on epitopes, are expected to be safer. However, subunit vaccines without potent adjuvants appear ineffective in eliciting protective mucosal immune responses. Inadequate stimulation by a protein antigen may even lead to immune tolerance, which further complicating the issue. Cholera toxin is one of the most potent adjuvants for mucosal vaccination, but its high toxicity prevents its use as an adjuvant.
Agren and colleagues have pioneered the elimination of the toxic nature of the protein by fusing the catalytic component of the toxin (CTA1) with a two-domain D of staphylococcal protein A (DD). The latter component enables the fusion protein to bind to B cells of all isotypes and then transform into plasma cells, producing immunoglobulins. The resulting fusion protein, CTA1-DD, retains the full adjuvanticity of cholera toxin but is entirely nontoxic [7-9]. The CTA1-DD demonstrates superior promotion of long-term immune responses compared to aluminum salts (Alum) and Ribi adjuvants [10]. An intranasal CTA1-DD-adjuvanted H3N2 split influenza vaccine elicited high titers of specific IgA in bronchoalveolar and vaginal lavages, as well as IgM and IgG in the sera of experimental animals [11]. The CTA1-DD is not only potent in inducing an IgA response but also effective in preventing the development of immune tolerance [12].
In addition to mixing antigens with CTA1-DD, protein antigens or peptides may be conjugated with CTA1-DD into a single fusion protein to ensure consistent adjuvanticity towards the protein antigen. An intranasal vaccine made by incorporating the ectodomain matrix-2 protein (M2e) of an influenza virus between CTA1 and DD was reported to be highly effective in mice, promoting high specific serum IgG and mucosal IgA and providing strong protection against a potentially lethal challenge infection with influenza virus [13, 14]. An intranasal vaccine for human respiratory syncytial virus (hRSV) made by conjugating the prefusion F protein (RBF) of the virus to the C-terminal end of CTA1-DD was reported to be effective based on vaccination and challenge trials in mice. The CTA1-DD-RBF vaccine stimulated the production of hRSV F-specific neutralizing antibodies (IgG1, IgG2a, sIgA) and T cell immunity in mice, effectively protecting the vaccinated animals from hRSV challenge [15].
In the present study, we aim to construct fusion proteins consisting of tandem repeat epitopes S14P5 or S21P2 with CTA1 and DD, resulting in CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD fusion proteins, which are prokaryotically expressible, functional, and soluble. Epitopes S14P5 and S21P4 are linear epitopes of the spike protein of SARS-CoV-2 that have been shown to induce neutralizing antibodies [16]. Previously, S14P5 and S21P4 have been formulated in tandem repeats and, when injected into rabbits, induced robust antibody responses that recognized the SARS-CoV-2 virus [17]. For these reasons, CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD are expected to elicit strong neutralizing antibody responses and, consequently, are suitable candidates for intranasal vaccines against COVID-19. The vaccines are expected to be thermostable because they are based on linear epitopes. In this study, we seek to identify potential challenges in protein expression and purification processes that may hinder the development of these fusion proteins as vaccine candidates. By elucidating the underlying mechanisms contributing to limited solubility and low yield, our study aims to provide valuable insights for optimizing the production of CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD, thereby enhancing their potential as viable components in future vaccine formulations. The approach used in this study should also be readily applicable to other infectious diseases.
Materials and methods
Construction and in silico analysis of CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD
The CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD constructs were generated by conjugating the catalytic subunit of cholera toxin (CTA1) at the N-terminal end of the tandem repeat SARS-CoV-2 epitopes S14P5 or S21P2 and the two-domain D of the staphylococcal protein A at the C-terminal end (Fig 1A). The sequence for CTA1 was obtained from a previous study [18] with a modification where residue phenylalanine at residu132 was replaced with serine to increase protein solubility [19]. The amino acid sequence of domain D staphylococcal protein A was from an earlier publication [20], and the tandem repeats S14P5 or S21P2 were from previous studies [16, 17]. A peptide linker, GGGS, was placed between CTA1 and the peptide, between the peptide and DD, and between the peptide repeats.
The characteristics of CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD were predicted based on their respective amino acid sequences. The general attributes of the recombinant proteins were assessed using the ExPASy application (https://web.expasy.org/protparam/). Solubility predictions were conducted employing NetSolP0.1 (https://services.healthtech.dtu.dk/services/NetSolP-1.0/) and SoluProt (https://loschmidt.chemi.muni.cz/soluprot/), while propensity to aggregate was evaluated using Aggrescan (http://bioinf.uab.es/aggrescan/). Tertiary structure prediction of CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD was performed utilizing the open-source software Alphafold2 and Chimera X [21, 22].
Expression of CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD
Synthetic genes for CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD were generated by reverse translation of the amino acid sequences and codon optimization for prokaryotic expression. The synthetic genes were inserted into the pET38a(+) expression plasmids (Genscript Inc. USA).
The plasmids were transformed into a competent BL21 strain of E. coli using a protocol described in a previous study, with some modifications [17]. Incubation temperatures of 37°C and 18°C were tested for 2 hours and 6 hours to obtain the highest amount of recombinant protein in soluble form. Induction was carried out at OD600 values of 0.1 and 0.4. The concentration of IPTG for induction was 0.3 mM in all experiments. Different solutions were tested to extract proteins from bacterial cells. Various solutions, including native buffer (0.5 M NaCl, 0.1 M sodium phosphate. pH 9), Nonidet P40, Triton X-100, and Tween-20, each at concentrations of 0.01%, 0.05%, and 0.1% in the native buffer, were tested for protein extraction. Chemicals were sourced from Sigma Aldrich.
Analysis of Total and Soluble Fusion Proteins
For the analysis of total protein, 0.5 mL of bacterial culture was centrifuged at 10,000 x g for 5 minutes to pellet the cells. The resulting pellet was solubilized in 50 µL of SDS-PAGE sample buffer, and 10 µL of this suspension was loaded into a well of a 15-well polyacrylamide gel of the Mini-PROTEAN II electrophoresis apparatus (Bio-Rad). The percentage of polyacrylamide was 10% for the separating gel and 4% for the stacking gel.
To analyze soluble protein, 1 mL of bacterial culture was centrifuged at 10,000 x g for 5 minutes to pellet the cells. The pellet was resuspended in 0.5 mL of PBS containing 10 mM EDTA and sonicated for three intervals of 30 seconds each. Following sonication, the sample was centrifuged again at 10,000 x g for 5 minutes, and the supernatant was carefully collected. The proteins in the supernatant were precipitated with trichloroacetic acid according to the standard protocol [23]. The precipitated proteins were then solubilized in 50 µL of SDS-PAGE sample buffer, and 10 µL of this solution was loaded into an SDS-PAGE gel as previously described.
For the immunoblot assay, proteins from the polyacrylamide gel were transferred onto a nitrocellulose membrane, and CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD were identified respectively with rabbit-anti-(S14P5)4 and rabbit-anti-(S21P2)4 antibodies produced in our previous studies [17]. Protein concentrations were assessed by densitometry of protein bands on SDS-PAGE gel or immunoblot, using ImageJ open-source software (NIH, USA).
Statistical analysis
Descriptive statistics were used to summarize the data. Differences in protein concentration were analyzed using the Wilcoxon rank-sum test, with p-values less than 0.05 considered statistically significant.
RESULT
In silico analysis of CTA1-(S14P5)4-DD and CTA1-(S21P2)4
The general characteristics of CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD, including the number of amino acids, molecular weight, aliphatic index, Grand Average of Hydropathicity (GRAVY), and instability index, are presented in Table 1. To facilitate a better understanding of the proteins, we included the well-known protein bovine serum albumin (BSA), whose amino acid sequence was obtained from GenBank (accession numberAAA51411.1), for comparison.
Both CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD have molecular weights of approximately 47 kDa, are acidic (pI<7), and more hydrophilic than BSA, as indicated by their lower GRAVY scores. According to the instability index, both proteins were less stable than BSA. The solubility of CTA1-(S14P5)4-DD and CTA1-(S21P4)4-DD were predicted to be higher than the highly soluble BSA. This higher solubility was in agreement with the higher hydrophilicity.
Moreover, the CTA1-(S14P5)4-DD and (S21P2)4 were predicted to have a lower propensity for aggregation than BSA, as indicated by the higher number of hot spot area (nHS) and normalized nHS for 100 residues (NnHS).
Upon overexpression in E. coli, CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD were predicted to exhibit solubility using online computational tools. The SoluPro application generated high prediction values, precisely 0.858 for CTA1-(S14P5)4-DD and 0.887 for CTA1-(S21P2)4-DD. In contrast, the NetSolP application yielded considerably lower predictions of only 0.6 for both proteins.
The schematic and tertiary structures of CTA1-(S14P5)4-DD and (S21P2)4 are shown in Fig 1. The schematic representation provides an overview of the domain organization and sequences of these construct (Fig 1A). The CTA1 domain consists of two groups of three antiparallel beta sheets, several short alpha helices, and connecting loops. The DD component comprised a pair of three long, parallel alpha helices connected by a long coil. The (S14P5)4 appeared as a long coil, whereas the (S21P2)4 appeared as four alpha helices with coils at both ends (Fig 1B,D).
The structures of S14P5 and S21P2 in the predicted structure of CTA1-(S14P5)4-DD and (S21P2)4 were similar to those in the cryo-electron microscopy structure of SARS-CoV2 spike protein previously submitted to the Protein Data Bank ((PDB # 6VXX) (Fig 1C, E).
Expression of CTA1-(S14P5)4-DD and CTA1-(S14P5)4-DD
The E. coli cells transformed with the CTA1-(S14P5)4-DD or CTA1-(S21P2)4-DD constructs exhibited robust protein expression upon induction with IPTG (Fig 2A). No protein expression was observed before IPTG induction, indicating correct regulation of genetic constructs and a responsive expression system. Substantial expression levels were evidenced by the intense protein bands observed in the gels. Densitometry analysis determined the expression levels of CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD to be 30.5% and 45.8% of total cell protein, respectively. Immunoblot analysis using monospecific anti-(S14P5)4 and anti-(S21P2)4 antibodies confirmed the identity of the expressed proteins (Fig 2B, C), validating their expected characteristics. The molecular weights of CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD were determined to be 61.2 kDa and 61.4 kDa, respectively, significantly higher than the calculated weights based on their amino acid compositions, which were 46.5 kDa and 46.9 kDa, respectively (Table 1).
Effect of growth stage (OD600) at induction, temperature and duration of incubation
Contrary to the in silico prediction, both CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD were poorly soluble when overexpressed in E. coli. The soluble fractions constituted only a negligible portion of the total proteins, and exact quantification was often unreliable due to smearing observed in many protein bands (S1 Fig, S1 Table). Lowering the culture temperatures from 37°C to 18°C did not increase protein solubility (p>0.05). In fact, lowering the temperature decreased the amount of soluble CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD. Prolonging incubation from 3 to 6 hours increased the amount of soluble proteins (p<0.05). Induction at an early log phase (OD600 of 0.1) resulted in higher production of soluble CTA1-(S21P2)4-DD (p<0.05), but not for CTA1-(S14P5)4-DD (S2 and S3 Tables).
Effect of detergent in the extraction and purification buffer
Protein purification using the native buffer (0.5 M NaCl, 0.1 M sodium phosphate, pH 9) with the NiNTA purification system yielded low amounts of CTA1-(S14P5)4-DD and even lower amounts of CTA1-(S21P2)4-DD (Fig 3). Adding detergents to the native buffer significantly increased the yields, particularly for CTA1-(S21P2)4-DD. Supplementing the buffer with 0.01% Nonidet P40 or Triton X-100 resulted in a fivefold increase in CTA1-(S14P5)4-DD yield and a seven-to eightfold increase for CTA1-(S21P2)4-DD, as measured by densitometry. However, 0.01% Tween 20 increased yields only twofold for both proteins. Further yield improvements were observed with higher concentrations of detergents, including Tween 20 at 0.05% and 0.1%. The optimal detergent concentration appears to be around 0.1%, as increasing the concentration from 0.05% to 0.1% did not result in significant additional increases in soluble protein amounts.
Effectivity of purification of CTA1-(S14P5)4-DD and CTA1-(S14P5)4-DD
The efficacy of purifying CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD was tested under the optimized conditions determined in this study: 37°C culture temperature and 0.1% Nonidet-P40 extraction buffer, scaled up to a larger culture volume. The results of these experiments are detailed in Tables 2 and 3 and S2 Fig. After a 2-hour induction period, the bacterial cells contained 24-42 mg/L of CTA1-(S14P5)4-DD and 32-40 mg/L of CTA1-(S21P2)4-DD, based on densitometry using BSA as the standard. Despite the substantial initial protein amounts, the yields were low at the end of the purification process. Following elution, dialysis, and concentration in buffers without detergent, the yields were less than 5% of the total protein present in the bacterial cells for both proteins. Adding Nonidet P-40 to the elution and dialysis buffers only marginally increased the yields to 2.4 mg/L of culture for CTA1-(S14P5)4-DD and 1.15 mg/L for CTA1-(S21P2)4-DD. These yields in the scale-up volume experiment contrasted with previous experiments, where the addition of Nonidet P-40 to the extraction buffer led to a sevenfold increase in the yield of CTA1-(S14P5)4-DD and a sixteen-fold increase in the yield of CTA1-(S21P2)4-DD (Fig 3). The most significant protein losses occurred during the NiNTA column separation stage, with over 90% of the target protein being lost during this process. During the lysate preparation stage, there was only a slight reduction, likely due to overestimated protein quantification caused by target band smearing, possibly indicating contamination with bacterial nucleic acids resulting from sonication (S2 Fig, Lane 2). Further substantial losses occurred during concentration and desalting in PBS, with more than 50% of the target proteins present in the eluate being lost. Adding Nonidet-P40 to the elution increased recovery, although significant losses could not be prevented. The discrepancy in yields between the small and large-volume experiments necessitates further fine-tuning of the experimental conditions.
DISCUSSION
In the present study, we construct two fusion proteins, CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD, as candidates for intranasal vaccines against SARS CoV-2. Each protein contains a linear epitope S14P5 or S21P2 of SARS CoV-2 spike protein, which is efficacious in inducing neutralizing antibodies [16]. The linear epitopes were made in the form of four tandem repeats. Our previous study confirmed that the S14P5 and S21P2 in this form enhanced the immunogenicity of the epitopes, and the induced antibodies effectively recognized the SARS-CoV-2 virus[17]. To further enhance the immune responses and to optimize them as intranasal vaccines, each of the tandem repeat epitopes has been conjugated to the catalytic subunit of cholera toxin (CTA1), and the domain D of staphylococcal protein A in dimer form (DD). The choice of domain D, instead of other domains, was beneficial due to its ability to bind not only IgG but also IgA and IgM [20].
In silico analysis of protein structure and function has become indispensable in modern biomedical research, particularly with the rapid advancement of bioinformatics tools. These computational methods can predict protein behavior, provide insights into protein interactions, stability, and potential binding sites, and guide the design of experiments, thus saving time and resources. To assess the functionality of the components within the fusion proteins, CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD, we performed in silico analyses to predict their compatibility with their native structures. The three-dimensional structures of the fusion proteins were predicted using AlphaFold-2, a computational method that predicts 3D protein structures from their respective amino acid sequences with near-experimental accuracy [22].
The structure of the CTA1-(S14P5)4-DD and CTA1-(S14P5)4-DD components, as predicted by AlphaFold-2, were compatible with their original structure, supporting the proper function of the proteins. The CTA1, which is the catalytic, ADP-ribosyltransferase and a NAD-glycohydrolase domain of cholera toxin, can be delineated into three discrete regions: CTA11 (residues 1-132), CTA12 (residues 133 to 161), and CTA13 (residues 162-193). The CTA11 forms a compact globular unit characterized by a combination of alpha-helices and beta-strands, harboring a catalytic cleft presumed to be the site of NAD and substrate binding. The CTA12 acts as a flexible linker bridging CTA11 and CTA13. The CTA13 is marked by a dense arrangement of hydrophobic residues [18]. Our predicted molecular model of CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD aligns well with the established crystallographic structure. Specifically, residues 1-132 exhibit a similar secondary structure composition of alpha-helices and beta-strands, suggesting the presence of the CTA11 subunit. Residues 133-161 manifest as an elongated coil, consistent with the characteristics of CTA12, while residues 162-193 adopt a conformation resembling a tangled loop interspersed with short alpha helices, indicative of CTA13.
The DD component of the proteins appeared as a pair of three long, parallel alpha helices. This structure is compatible with the crystal structure domain D of staphylococcal protein A previously reported [20]. The secondary structure of epitopes S14P5, which appeared as a long loop, and S21P2, which appeared as an alpha helix and loops, were compatible with the relevant segment in the crystal structure of spike protein SARS CoV2 deposited in the Protein Data Bank [24].
Proper design constitutes the initial step in obtaining CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD that function as intended. The subsequent crucial step involves the expression and purification of the constructed fusion proteins, ensuring the preservation of its structure and function. Prokaryotic expression remains the preferred choice for protein expression due to its simplicity and cost-effectiveness. However, a significant challenge in prokaryotic expression is the formation of proteins in insoluble aggregates [25, 26]. Expressing proteins in soluble forms within E. coli offers notable advantages. Soluble expression guarantees the maintenance of proteins’ native conformation, thereby preserving their functionality. In contrast, expression in inclusion bodies often requires denaturation and subsequent renaturation steps, posing risks of protein function loss. Moreover, isolating proteins from insoluble inclusion bodies entails higher costs and longer processing times [27].
Predicting protein solubility before initiating expression confers significant advantages, particularly for engineered constructs like CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD. Therefore, it was imperative to evaluate their solubility, primarily upon overexpression in E. coli. However, current bioinformatic tools still exhibit limited reliability in predicting the solubility of overexpressed proteins, as evident from this study. Notably, based on their amino acid sequences, both CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD demonstrated high solubility. In fact, these proteins displayed a notably higher probability of solubility compared to bovine serum albumin (BSA), a protein renowned for its solubility. Although highly soluble proteins are not necessarily soluble when over-expressed in E. coli, this is true for both CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD, which were predicted to be soluble [28].
The solubility of CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD on overexpression was predicted by two computational tools: (1) SoluPro, which predicts solubility based on amino acid composition, dipeptide composition, and physicochemical properties, employing a machine learning approach, and (2) NetSolP, which predicts solubility based on amino acid composition, secondary structure, and solvent accessibility employing a neural network-based approach.
These tools have undergone significant improvements, achieving accuracies of 74% for SOLUPro and 70% for NetSolP, surpassing many previous methods [29, 30]. However, despite these advancements, there remains a risk of misprediction, particularly for engineered proteins like CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD, as these computational tools may not have been trained on similar proteins.
In experiments aimed at optimizing the solubility of expressed protein in E. coli, quantifying expressed protein is challenging. In the present study, we used densitometry techniques to quantify the proteins in SDS PAGE gels or immunoblots with specific antibodies. This technique can measure both soluble and insoluble proteins. Additionally, it can be used to measure proteins in mixtures with other unrelated proteins present in the samples. Despite these advantages, the technique requires high-resolution SDS PAGE gels or immunoblots that allow for the complete separation of proteins, which is often unattainable.
Formation of insoluble expressed proteins as inclusion bodies emerged as one of the most significant obstacles in prokaryotic expression systems. Inclusion bodies are formed due to a rapid rate of protein expression, a key objective in recombinant protein production. When protein expression exceeds the host cells’ capacity for post-translational modifications and folding, misfolding occurs, leading to aggregation into inclusion bodies as hydrophobic residues become exposed. Slowing down the expression rate by modifying culture conditions, such as lowering culture temperature and the concentration of induction agents, comprise the most practical approach to increasing the solubility of expressed proteins. Lowering temperature also diminishes the hydrophobic interaction between the expressed proteins, thus reducing aggregation [26].
Attempts to convert expressed proteins in E. coli from insoluble inclusion bodies to soluble forms by lowering the cultivation temperature have proven successful in many proteins. For instance, a three-fold increase in the soluble fraction of Green Fluorescent Protein (GFP) was achieved by lowering the temperature from 37°C to 16°C [31]. Similarly, human interferon-α2 and γ, which formed insoluble aggregates when expressed in E. coli at 37°C, demonstrated increased solubility by 30-90% when cells were cultivated at 23°C [32]. The increased solubility of heterologous proteins expressed in E. coli at lower temperatures has also been observed in various other proteins, including β-lactamase, human epidermal growth factor, human hemoglobin, and β-galactosidase [33-35]. However, the effectiveness of lowering the culture temperature in increasing protein solubility varies among different proteins [26]. As demonstrated in this study, lowering the cultivation temperature to 18°C instead of the typical 37°C did not yield the expected increase in solubility.
Moreover, in a previous study, induction of P5βR-transformed E. coli with IPTG during the early-log phase substantially increased the solubility of the expressed P5βR (progesterone 5β-reductase [36]. The increase in the solubility of expressed proteins by IPTG induction at the early log phase is likely dependent on the protein. In the present study, expression of soluble CTA1-(S21P2)4-DD significantly increased, but not CTA1-(S14P5)4-DD. It is unknown why the solubility of some expressed proteins increases and not others.
The production of fusion proteins, CTA1-DD or CTA1-peptide-DD, utilizing E. coli expression systems has been extensively explored in various studies. Despite numerous attempts, the expression of CTA1-DD or CTA1-peptide-DD consistently yields insoluble inclusion bodies, with no reported instances of soluble expression. This phenomenon has been observed across different E. coli expression systems, including the pUC19 expression vector with TG-1 E. coli in 2YT medium [8, 19, 37-39], DH5 E. coli in SYPPG medium [13], as well as the use of pET28a with BL-21 E. coli strains [15]. Despite forming insoluble inclusion bodies, the extracted proteins have been successfully purified using guanidine HCl and chromatography techniques, followed by subsequent renaturation steps. Remarkably, the purified proteins have exhibited functionality despite these initial hurdles, as evidenced by various assays. Specifically, the ADP-ribosyltransferase activity of CTA1 has been validated through NAD: agmatine assay [7, 19, 37, 39], while the DD component has been demonstrated to bind specifically to immunoglobulin receptors on B cells [7].
In previous studies, purification of the fusion protein CTA1-DD expressed in E. coli as inclusion bodies involved solubilization with guanidine-HCl, isolation using a IgG-Sepharose-affinity column, and renaturation, resulting in yields of 8-60 mg/L culture [19, 40]. Although it is inappropriate to directly compare different proteins in this aspect of purification, the yields of CTA1-(S14P5)-DD and CTA1-(S21P2)-DD using native techniques were significantly lower.
Despite efforts to enhance solubility in the present study, purification using native techniques yielded only 1-2 mg of pure proteins/L of culture.
The low yield of native CTA1-(S14P5)-DD and CTA1-(S21P2)-DD was due to substantial losses during the purification stages. Higher yields are likely achievable by optimizing the purification process, as the expressed fusion proteins constitute 30-45% of the total bacterial protein after just 3 hours of induction. Additionally, as found in this study, the inclusion of non-denaturing detergents such as Nonidet P40, Triton X100, or Tween 20 in the extraction buffer effectively solubilizes the expressed CTA1-(S14P5)-DD and CTA1-(S21P2)-DD.
The approach used in this study for intranasal vaccine development can be easily adapted to other infectious diseases. For diseases that have been existed for some time, lists of linear epitopes that evoke neutralizing antibodies are likely to have been identified. Fusion proteins consisting of these tandems repeat epitopes, and CTA1-DD can be constructed for mucosal vaccines. For new emerging infectious diseases, such as those that may arise in the future, the complete sequence of the causal agents, similar to what occurred with SARS-CoV-2, will be available immediately. Linear epitopes can be identified from protein sequences using computational tools, which are becoming increasingly accurate in their predictions. As a result, vaccines could be available immediately before the disease spreads to more expansive areas.
In summary, two fusion proteins, CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD, have been constructed as candidates for intranasal vaccines against COVID-19. The constructs were expressed abundantly in E. coli as insoluble inclusion bodies. The solubility of the expressed proteins did not increase by lowering the cultivation temperature. Expression of soluble CTA1-(S21P2)4-DD, but not CTA1-(S14P5)4-DD, increased when the culture was induced with IPTG at the early log-phase growth. The solubility of both CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD was significantly enhanced by adding non-denaturing detergents to the extraction buffer.
Supporting information
S1 Fig. Visualization of CTA1-(S14P5)4-DD (A) and CTA1(S21P2)4-DD (B) proteins using specific antibodies in immunoblot assays. Proteins were analyzed as either soluble fractions or total proteins within bacterial cells. Cultures were grown at 37°C or 18°C, incubated for 3 or 6 hours, and induced with IPTG at early log-phase (OD600 of 0.1) or mid-log-phase (OD600 of 0.4). Note that the samples for the soluble fractions were derived from four times more bacterial cells than those for total protein.
S2 Fig. Purification of CTA1-(S14P5)4-DD and CTA1-(S21P2)4-DD, each from 150 mL cultures. Bacterial cell lysates were prepared in PBS containing 0.1% Nonidet P40 and subjected to NiNTA column chromatography. Proteins adsorbed by the column were eluted with either 0.5 M imidazole (A, C) or 0.5 M imidazole containing 0.1% Nonidet P40.
Lane 1: Proteins from 50 µL culture after induction
Lane 2: Proteins from the supernatant of cell lysate from 125 µL culture
Lane 3: Eluate from the NiNTA column derived from 500 µL culture
Lane 4: Dialyzed and concentrated eluate from 2 mL culture.
S1 Table. Percentage of Soluble Protein Recombinants at Different Incubation Temperatures, Durations of Incubation, and Growth Stages of IPTG Induction.
S2 Table. Production of Soluble CTA1-(S14P5)4-DD, as indicated by at Different Cultivation Temperatures, Incubation Times, and Growth Stages of Induction
S3 Table. Production of Soluble CTA1-(S21P2)4-DD, as indicated by at Different Cultivation Temperatures, Incubation Times, and Growth Stages of Induction
Acknowledgments
The authors would like to express their sincere gratitude to Dr. Fx. Sudirman, Director of Biotis Pharmaceuticals Indonesia, for his invaluable support in the publication of this study.