Phase variable glycosylation in non-typeable Haemophilus influenzae

Non-typeable Haemophilus influenzae (NTHi) is a leading cause of respiratory tract infections worldwide and continues to be a global health burden. Adhesion and colonisation of host cells are crucial steps in bacterial pathogenesis, and in many strains of NTHi interaction with the host is mediated by the high molecular weight adhesins HMW1A and HMW2A. These adhesins are N-glycoproteins which are modified by cytoplasmic glycosyltransferases HMW1C and HMW2C. Phase variation in the number of short sequence repeats in the promoters of hmw1A and hmw2A directly affects their expression. Here, we report the presence of similar variable repeat elements in the promoters of hmw1C and hmw2C in diverse NTHi isolates. In an ex vivo assay, we systematically altered substrate and glycosyltransferase expression and showed that both of these factors affected the site-specific efficiency of glycosylation on HMW-A. Glycosylation occupancy was incomplete at many sites, variable between sites, and generally lower close to the C-terminus of HMW-A. We investigated the causes of this variability. As HMW-C glycosylates HMW-A in the cytoplasm, we tested how secretion affected glycosylation on HMW-A and showed that retaining HMW-A in the cytoplasm indeed increased glycosylation occupancy across the full length of the protein. Site-directed mutagenesis showed that HMW-C had no inherent preference for glycosylating asparagines in NxS or NxT sequons. This work provides key insights into factors contributing to the heterogenous modifications of NTHi HMW-A adhesins, expands knowledge of NTHi population diversity and pathogenic capability, and is relevant to vaccine design for NTHi and related pathogens.


Introduction
Haemophilus influenzae is a Gram-negative, non-motile coccobacillus that belongs to the Pasteurellaceae family. The bacterium is a commensal microbe that commonly resides within the upper airways of humans (1), but is also an opportunistic pathogen that can cause mild, acute, chronic, recurrent, localized or invasive diseases (2)(3)(4)(5). It is mostly associated with upper and lower respiratory tract infections, otitis media, sinusitis, conjunctivitis, and chronic obstructive pulmonary disease (COPD) (5,6). H. influenzae can be characterised into two distinct types: those encapsulated by a polysaccharide capsule and those that are unencapsulated, commonly termed non-typeable H. influenzae (NTHi) (2). There are six known capsular serotypes (serotypes a-f) of which serotype b (Hib) was once one of the primary causes of pneumonia, sepsis, epiglottitis, and bacterial meningitis (2,7). While the incidence of invasive diseases caused by Hib has been largely controlled through the conjugated Hib vaccine (2,8) (which is directed against type-specific polysaccharide capsule), this vaccine does not protect against NTHi, which is now an emerging pathogen and one of the leading causes of invasive bacterial disease (9). As a global health burden with neonates, young children, and the elderly as key vulnerable populations, there is a clear need to develop vaccines against NTHi (2,10).
These adhesins are differentially distributed in H. influenzae isolates (25). Approximately 75% of clinical isolates possess the HMW adhesin system, while most of the remaining strains possess a Hia homolog that can still permit efficient attachment to cultured epithelial cells (25,(32)(33)(34). Furthermore, in strains lacking HMW or Hia, the Hap adhesin may allow for adherence to epithelial cells (32).
The HMW-ABC N-glycosylation system of NTHi is unusual (Fig. 1). In most wellcharacterized systems N-glycosylation occurs in the eukaryotic endoplasmic reticulum or bacterial periplasm and requires an oligosaccharyltransferase (OST) to transfer a preassembled oligosaccharide from a lipid donor to asparagine residues in substrate proteins (43)(44)(45). In contrast, NTHi employs HMW-C, a soluble cytoplasmic protein belonging to the GT41 family of glycosyltransferases, a family which is otherwise comprised of O-GlcNAc transferases (46). HMW-C-like enzymes directly transfer hexose monosaccharides from UDP-hexose to acceptor proteins within the bacterial cytoplasm (47-49). Interestingly, both OST and HMW-C enzymes preferentially glycosylate Asn residues in glycosylation sequons (N-X-S/T; X¹P) (50). The Actinobacillus pleuropneumoniae HMW-C glycosyltransferase has been reported to catalyze both N-and O-linked glycosyltransferase reactions (51). NTHi HMW-A is modified with mono-hexose or di-hexose glycan structures, suggesting that HMW-C is also capable of forming hexose-hexose O-glycosidic bonds (45,49,51). Some bacteria, such A. pleuropneumoniae, contain 'orphan' HMW-Cs, in which hmwC and the gene encoding the target adhesin or autotransporter protein are at unlinked locations in the genome (52,53). In NTHi, hmwC is adjacent to the hmwAB operon, although expression of hmwAB and hmwC are controlled by separate promoters. Furthermore, NTHi contains two homologous hmw-ABC loci (53): hmw1ABC and hmw2ABC.
Genetic analysis of hmw1A and hmw2A has revealed that the expression of NTHi HMW-A is regulated by phase variation in the number of 7-bp short sequence repeats (SSRs) (5'-CTTTCAT-3') within its promoter (54). This phase variation is thought to occur through slipped-strand mispairing and DNA polymerase slippage during replication of repetitive DNA sequences (54). The length of the promoter repeat tract is inversely correlated with the expression levels of HMW1A and HMW2A (54). As opposed to a typical "on/off" phase variable system, these promoter SSRs therefore allow for a gradient in HMW-A protein abundance and may contribute to heterogeneity in the bacterial population during infection (54). It is likely that phase variable expression of the HMW adhesins acts as an adaptive strategy in NTHi to aid in infection or evasion of host immune responses (55,56). Specifically, population-level diversity may allow for selection of bacterial populations that best adapt to quickly-changing or niche environments and therefore allow for long-term survival within the human host.
Describing the full potential diversity of post-translational heterogeneity is critical for understanding the structure and function of the HMW-A adhesins and for their use as vaccine candidates. Here, we therefore assessed the effect of variable substrate (HMW-A) and glycosyltransferase (HMW-C) abundance on site-specific glycosylation of HMW-A, and investigated factors that control the efficiency of site-specific glycosylation in the adhesin.

Bacterial strain growth conditions
NTHi R2846 hmw1AB and hmw1C genes were used in this study. Escherichia coli cells expressing the gene/s of interest were grown at 37 °C in Luria-Bertani (LB) broth or agar plates (2% Bacto-tryptone, 1% Yeast extract, 2% NaCl). Antibiotics used to select for and maintain plasmids were ampicillin (100 µg/mL) and kanamycin (50 µg/mL). Various concentrations of arabinose and isopropyl β-D-1-thiogalactopyranoside (IPTG) were added to the media to induce protein expression where appropriate.

Construction of plasmids
Oligonucleotide primers were designed to amplify DNA encoding hmw1AB incorporating NcoI and SacI restriction sites. PCR amplicons were digested with NcoI and SacI (New England Biolabs, Ipswich, Massachusetts), purified using the Sigma-Aldrich gel extraction kit, and ligated into NcoI and SacI-digested pET28a(+). Cloning with NcoI caused a D 2 N substitution in HMW1A, which was reverted to the native sequence using site-directed mutagenesis (57).
The pBad-HMW1C plasmid previously constructed by Gawthorne (2014) (58) was used for this study. Site-directed mutagenesis (57) was used to introduce an S 1046 T point mutation in HMW1A and to delete sequence encoding the HMW1A signal sequence (amino acids 1-68).
Top10 E. coli cells transformants were selected by growth on LB plates with kanamycin.

Experimental design and statistical rationale
Experiments were performed ex vivo using BL21 Rosetta cells containing the pBad-hmw1C plasmid (58). Cells were grown in LB media containing 100 µg/mL ampicillin, and incubated at 37 °C until the cells reached mid-log phase. The cells were then chemically transformed with the pET28a(+) plasmid harbouring the hmw1AB gene, with transformants selected by growth on LB agar containing ampicillin and kanamycin. A single colony was then inoculated into 20 mL LB with 100 µg/mL ampicillin and 50 µg/mL kanamycin, and grown until an OD 600 of 0.30. Varying amounts of arabinose and IPTG were supplemented to the media, and cells were incubated with shaking at 37 °C. Cells were harvested at an OD 600 of 1 by centrifugation and cell pellets frozen at -20 °C. In the titration assays, we varied expression of HMW1AB with a range of concentrations of IPTG (0.05mM, 0.1mM, 0.2mM, 0.5mM, or 1mM) with 0.2% arabinose, and of HMW1C with a range of concentrations of arabinose (0.00002%, 0.0002%, 0.002%, 0.02%, 0.2%, or 2%.) with 0.1mM IPTG. For analysis of variant HMW1A, samples were prepared in triplicate to allow quantification and statistical analyses.

Whole cell extraction
Proteins were denatured and reduced by resuspending frozen cell pellets in 1 mL 6 M guanidinium chloride, 100 mM Tris HCl buffer pH 7.5, and 10 mM DTT and incubation at 30 °C for 30 min with shaking. Cysteines were alkylated by addition of acrylamide to a final concentration of 50 mM and incubation at 30 °C for 30 min with shaking. Proteins were precipitated by adding 20 µL of reduced/alkylated protein sample to 400 µL of 1:1 methanol: acetone and incubation at -20 °C for at least 4 h. Samples were centrifuged at 18,000 rcf for 10 min, the supernatant removed, centrifuged for 1 min, all supernatant removed, and the pellets air dried. Each pellet was resuspended in 100 µL 100 mM ammonium acetate with 1 µg trypsin and incubated at 37 °C for 16 h with shaking.

Mass spectrometry data analysis
Peptides and proteins were identified using ProteinPilot 5.1 (SCIEX), searching against E. coli K12, NTHi R2846 HMW1A, HMW1B, and HMW1C, trypsin and common contaminants (4500 total proteins), with settings: Sample type, identification; Cysteine alkylation, acrylamide; Instrument, TripleTof 5600; Species, none; ID focus, biological modifications; Enzyme, trypsin; Search effort, thorough ID. False discovery rate analysis using ProteinPilot 5.1 (SCIEX) was performed on all searches. Peptides identified with greater than 99% confidence and with a local false discovery rate of less than 1% were included for further analysis. MS/MS fragmentation spectra and peak selection were manually inspected where required using PeakView 2.1 (SCIEX), with settings: Shared peptides, allowed; Peptide confidence threshold, 99%; False discovery rate, 1%; XIC extraction window, 6 min; XIC width 75 ppm. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE (63) partner repository with the dataset identifier PXD015046. Normalised protein abundance was calculated as the ratio of the intensity of the protein of interest (HMW1A or HMW1C), to the sum of all proteins (HMW1A, HMW1B, HMW1C, trypsin, and all detected E. coli proteins) (64). Glycan occupancy was measured as the ratio of the intensity of the glycosylated peptide to the sum of the intensities of the glycosylated and unmodified peptides (59). Extracted ion chromatograms (XICs) were generated using PeakView (SCIEX).

DNA sequence and protein structural analysis
Tandem Repeats Finder (65) was used to find and analyse short sequence repeats (SSRs) in the promoter regions of the genes encoding HMW1A, HMW1C, HMW2A, and HMW2C of NTHi  (66). The structural context of N-glycosylation sites on HMW1A was predicted using Jalviewer (67) and the JPred protein secondary structure prediction server, and based on JNETPRED (68).

Presence of a variable number of repeat elements in the promoters of hmwA and hmwC
Previous studies reported the presence of variable repeat elements upstream of hmw1A and hmw2A in NTHi (27,29,54). We investigated the presence and diversity in variable SSRs in the promoter regions of hmw1/2ab from five different strains: R2846, 2019, NCTC8143, 86-028NP, and PittEE. Multiple sequence alignment of the region immediately upstream of the hmw1/2A loci showed that the previously reported 7-bp SSR (5'-CATCTTT-3'(54), referred to here as 5'-ATCTTTC-3') was present in all five strains (Fig. 2). However, the precise number of repeats varied between NTHi strains and between hmw1A and hmw2A, consistent with previous findings (27). This variation likely reflects rapid phase variation in repeat length rather than consistent differences between strains.
Inspection of the hmw1C and hmw2C loci in these same five strains also revealed the presence of repeat elements immediately upstream of hmw1/2C, but with different sequences to those observed at the hmw1/2A loci (Fig. 2). Out of the ten hmw1C and hmw2C promoter sequences    3A). The converse was also possible, keeping constant HMW1A abundance with 0.1 mM IPTG and increasing HMW1C abundance by increasing the concentration of arabinose from 0.00002% to 2% (Fig. 3B). This confirmed that the absolute and relative abundance of the HMW1C glycosyltransferase and its HMW1A glycoprotein substrate could be controlled in our heterologous expression system.

Site-specific glycosylation occupancy of HMW1A
We next performed a qualitative and quantitative assessment of glycosylated and unmodified tryptic peptides from HMW1A that were detected by DDA LC-ESI-MS/MS. We did not identify any peptides from the signal peptide region of HMW1A. Peptides belonging to the pro-piece (HMW1A-PP) were only identified in their non-glycosylated form, while numerous glycosylated and non-glycosylated peptides were identified from mature HMW1A. Coexpression of HMW1A and HMW1C allowed for robust detection of 45 sequon-containing peptides from HMW1A including 11 peptides with one sequon glycosylated with a single hexose and 19 unglycosylated peptides with one sequon ( Table 2, Supplementary Figures S1-S18). In addition, we identified 8 distinct peptides and glycopeptides with more than one sequon. All glycosylation events were observed on Asn residues in sequons. Fig. 4 shows MS/MS spectra identifying tryptic peptides containing Asn484 in its glycosylated and unglycoyslated forms. Hexose glycosylated Asn residues are indicated in bold. Asn residues in sequons that are not modified are in italics. All peptides were detected at a confidence level >99%. z, charge state. m/z, mass-to-charge ratio. ∆mass, difference between measured and theoretical mass. Glycosylation site assignment was based on the presence of a sequon. *, peptide containing more than one sequon with ambiguous hexose-Asn site assignment.  Figures S19 and S20).

Titrating HMW1A substrate concentration affects site-specific glycan occupancy
Phase variation in the number of repeats in the promoter of hmw1A affects its expression (54).
Using our ex vivo system, we therefore tested if changing HMW1A abundance also affected its site-specific glycosylation occupancy. We kept the abundance of the HMW-C glycosyltransferase fixed with 0.2% arabinose, controlled the abundance of HMW1A by addition of IPTG from 0.05 mM to 1 mM, and then used SWATH-MS to measure the sitespecific glycosylation occupancy at 18 detectable sequons in HMW-A. This analysis showed that glycan occupancy in HMW-A was influenced by the concentration of the substrate protein in the cellular environment ( Fig. 4 and 5).
We observed that as the abundance of the HMW1A protein substrate increased, site-specific glycan occupancy decreased at each site of modification in HMW1A (Fig. 5A). Our results were consistent with saturation of the HMW1C enzyme at high concentrations of HMW1A protein substrate. That is, at low HMW1A protein concentrations there was sufficient HMW1C enzyme to modify available sites in HMW1A, while at high expression levels of the HMW1A protein substrate the HMW1C enzyme had reduced capacity to modify acceptor sites. Our results therefore showed that titrating the expression of the HMW1A protein substrate could vary its site-specific extent of glycosylation. Titrating HMW1C glycosyltransferase concentration affects site-specific glycan occupancy on

HMW1A
The presence and variation in the number of SSRs in the promoter region of hmw1C (Fig. 2B) suggested that HMW1C expression is influenced by the length of the repeat tract. We therefore studied the effect of variable HMW1C glycosyltransferase concentration on site-specific glycosylation occupancy of HMW1A. We varied HMW1C expression by addition of different amounts of arabinose (0.00002-2%), while the abundance of HMW1A was kept constant with 0.1 mM IPTG. We observed that varying the concentration of HMW1C resulted in changes in the extent of glycosylation across various sites in HMW1A (Fig. 5B). We observed a direct relationship between the concentration of the glycosyltransferase and glycosylation efficiency, whereby glycan occupancy in HMW1A increased with increasing HMW1C expression, until it asymptotically reached a point of saturation. This effect was consistent with an increased availability of glycosyltransferase increasing the efficiency of glycosylation of HMW1A.
However, the maximum glycan occupancy in HMW1A (observed with 0.1 mM IPTG and 0.2% arabinose) did not reach greater than 90%, suggesting that factors apart from HMW1C abundance limited glycosylation efficiency.

Glycosylation occupancy decreases towards the C-terminus of HMW1A
The pattern of glycosylation occupancy along the length of HMW1A was striking, with no glycosylation detected in HMW1A-PP, efficient glycosylation throughout most of the length of mature HMW1A, and lower occupancy glycosylation towards the C-terminus of the protein (Fig. 6). Decreased glycosylation efficiency close to the C-termini of proteins has also been reported for secretory eukaryotic glycoproteins (72, 73). While we observed a general decrease in site-specific glycan occupancy across HMW1A, the limited number of sequons we detected meant that the differences in glycosylation efficiency along the length of the protein did not reach statistical significance. We next investigated factors that might influence glycosylation efficiency at the C-terminus of HMW1A. The efficiency of eukaryotic co-translocational Nglycosylation decreases near the C-terminus of the substrate protein because polypeptide chain termination occurs before the acceptor site in the nascent polypeptide reaches the OST active site (73). Once translation of the polypeptide has been completed it is released from the ribosome and is therefore more rapidly translocated through the translocon and past the OST (73,74). Although evolutionarily distinct from eukaryotic OST, a similar mechanism may be relevant to the HMW-ABC system. Glycosylation in NTHi occurs in the cytoplasm, and therefore once HMW-A is translocated into the periplasm it can no longer be glycosylated by HMW-C. It is therefore likely that secretion of HMW-A into the periplasm influences the glycosylation efficiency of HMW-A. Furthermore, the glycosylation of sequons located near the C-terminus of HMW-A would be more likely to be affected by this process. We therefore tested if HMW1A secretion affected the glycosylation efficiency of HMW1A. We created a variant of HMW1A lacking the residues corresponding to the signal peptide (amino acids 1-68), and co-expressed this truncated HMW1A together with HMW1C in BL21 E. coli cells with 0.1 mM IPTG and 0.2% arabinose. Whole cell extracts were prepared, tryptic digests analysed using LC-ESI-MS/MS, and SWATH-MS used to quantify glycosylation occupancy across HMW1A. Deletion of the HMW1A signal peptide did not specifically improve glycosylation at the C-terminus but rather globally increased glycosylation efficiency across all glycosylated Asn sites: nine out of ten partially modified sites were significantly more efficiently glycosylated (P<0.05, Student's t-test) with deletion of the HMW1A signal peptide ( Fig. 6A and B). All of the sequons that were not glycosylated in full length HMW1A remained unglycosylated in the variant lacking a signal peptide. Overall, our results suggested that when HMW1A is retained in the bacterial cytoplasm, glycosylation efficiency increases across the full length of the protein. This increase in efficiency may be due to increased exposure and interaction between the protein substrate and glycosyltransferase, and the absence of competition between glycosylation and secretion. These findings also suggest that although protein secretion limits glycan occupancy, this is not the cause of the inefficient glycosylation observed at the C-terminus of HMW1A, nor of the quantitative differences in glycosylation occupancy between different sequons throughout the protein.

HMW1C efficiently glycoslyates both NXS and NXT sequons
NXT sequons are more efficiently glycosylated than NXS sequons in a number of Nglycosylation systems (75), due to high affinity of Thr at the +2 position to the peptide binding site in OST (69). Upon inspection of HMW1A, we noted that there was an enrichment of NXT sequons and a depletion of NXS sequons towards the C-terminus of HMW1A (P-value= 0.011, Mann-Whitney) (Fig. 6C). This potentially correlated with the lower glycosylation occupancy we observed at sites close to the C-terminus of HMW1A (Fig. 6A) 6F). This suggested that the presence of serine or threonine at the +2 position of a sequon does not have a strong effect on its extent of glycosylation by HMW-C.

Discussion
Expression of the HMW-A adhesin glycoprotein is highly variable in NTHi due to phase variation in the number of repeat elements in its promoter. Here, we established that this sequence variability is a feature of the promoter regions of hmwA and also of hmwC in diverse NTHi genomes (Fig. 2). This suggests that phase variable expression of hmwA and hmwC is a common feature of NTHi.
By independently titrating the expression of HMW1A and HMW1C in an ex vivo system, we showed that varying the abundance of the HMW1A glycoprotein substrate or of the HMW1C glycosyltransferase quantitatively affected site-specific glycosylation across HMW1A (Fig. 5).
Site-specific glycosylation occupancy was increased either by increasing the abundance of the HMW1C glycosyltransferase, or by decreasing the abundance of the HMW1A glycoprotein.
Phase variation in coding regions of genes that affects the presence or absence of the respective protein products is common in bacterial pathogens to enable immune evasion (76)(77)(78).
Extending this paradigm, we propose that phase variation in the site-specific quantitative extent of glycosylation across the many glycosylation sites in HMW-A results in a similar diversification in proteoforms presented on the bacterial cell surface in a population of NTHi.
This phase variability in the extent of glycosylation may then play a key role in immune evasion, and is also relevant for consideration of the use of HMW-A and other bacterial glycoproteins as vaccine candidates.
Even under conditions which provided the highest extent of glycosylation we could always detect some fraction of each site that was not modified. This suggested that factors other than the ratio of HMW1C glycosyltransferase to HMW1A glycoprotein substrate are important in controlling the extent of glycosylation in our ex vivo system. Amongst other potential factors, the extent of site-specific glycosylation depended on the position within HMW1A (Fig. 6). We identified peptides from HMW1A-PP and mature HMW1A, but no peptides from the signal peptide. This is consistent with rapid degradation of the signal peptide after cleavage by signal peptidase. When we investigated the structural context of sites that were efficiently and inefficiently modified, we found no difference in the localization of glycosylated and nonglycosylated sites on loop structures or secondary structural elements. The sequons located in the HMW1A-PP (at Asn120 and Asn183) were not glycosylated, but were also present in loops between b sheets. This suggests that aspects of local protein sequence or structure of the HMW1A-PP, or its accessibility to HMW1C, inhibited glycosylation in this region of the protein. It is also possible that lack of glycosylation of HMW1A-PP is critical for recognition by the HMW1B translocator pore or for translocation (79). This is plausible considering TpsA proteins contain a highly conserved ~250 amino acid TPS domain essential for protein secretion (79,80).
We observed that glycosylation sites close to the C-terminus of HMW1A were generally inefficiently glycosylated (Fig. 6). When we deleted the signal sequence of HMW1A, the Cterminus of HMW1A remained relatively inefficiently glycosylated but there was a general increase in site-specific glycan occupancy across the full length of HMW1A ( Fig. 6A and B).
This indicated that secretion into the periplasm competes with glycosylation by HMW1C, but that this was not the underlying cause of inefficient glycosylation towards the C-terminus of HMW1A. We observed an enrichment of NXT sequons close to the C-terminus of HMW1A, perhaps evolved to increase the efficiency of glycosylation in this region of the protein (Fig.   6C). However, when we tested this by experimentally changing an NXS sequon to an NXT sequon in HMW1A, we found no significant difference in glycan occupancy between the native N 1044 IS 1046 and variant N 1044 IT 1046 (Fig. 6D-F). The relevance of inefficient glycosylation towards the HMW-A C-terminus remains unclear.
In summary, we have established that variable expression of HMW-A and HMW-C quantitatively influences glycosylation occupancy across diverse glycosylation sites in HMW-A. We predict that phase variable site-specific glycosylation facilitates antigenic escape or modulates adhesion by NTHi, and is therefore key in pathogenesis.