Fast and efficient digestion of adeno associated virus (AAV) capsid proteins for liquid chromatography mass spectrometry (LC-MS) based peptide mapping and post translational modification analysis (PTMs)

Adeno-associated virus (AAV) represent a widely used delivery mechanism for gene therapy treatments currently being developed. The size and complexity of these molecules requires the development of sensitive analytical methods for detailed product characterization. Among the quality attributes that need to be monitored, characterization of the AAV capsid protein amino acid sequences and any associated post translational modifications (PTM) present should be performed. As commonly used for recombinant protein analysis, LC-MS based peptide mapping can provide sequence coverage and PTM information to improve product understanding and the development and deployment of the associated manufacturing processes. In the current study, we report a fast and efficient method to digest AAV5 capsid proteins in only 30 minutes prior to peptide mapping analysis. The performance of different proteases in digesting AAV5 was compared and the benefits of using nanoflow liquid chromatography for separation prior to high resolution mass spectrometry to obtain 100% sequence coverage are highlighted. Characterization and quantitation of PTMs on AAV5 capsid proteins when using pepsin as a single protease is reported, thereby demonstrating the potential of this method to aid with complete characterization of AAV serotypes in gene therapy development laboratories.


Introduction
Adeno-associated virus (AAV) gene therapy products, represent an area of growing interest due to their ability to deliver target genes to human cells, generating long term gene expression with minimal evidence of toxicity or invocation of an immune response [1,2]. However, the biological complexity found in several known AAV serotypes poses a challenge in understanding and optimizing manufacturing processes, increasing the pressure for the development of analytical methods that can perform accurate measurement of capsid protein sequence variability and stability [3].
The icosahedral capsid of AAV is formed by 60 copies of 3 capsid proteins, namely VP1, VP2 and VP3, present in a ratio of approximately 1:1:10, respectively [3]. Although serotypespecific differences in sequence composition have been elucidated previously [4], more information, especially concerning post translational modifications (PTMs), is required. This knowledge could lead to a better understanding of biological differences between serotypes in terms of infectivity, cell targeting and tissue specificity [3,5,6] but also, their molecular stability, which diverges among different serotypes [7][8][9].
Being able to efficiently characterize VPs is crucial for knowledge-driven improvements of AAV products, such as mixed serotype capsids that can outperform treatments using single serotype AAVs [3,10].
LC-MS based peptide mapping is commonly used in the biopharmaceutical industry to verify the primary structure of a therapeutic protein and to monitor multiple quality attributes which can impact safety and efficacy of the final product. With recent advances in mass spectrometric instrumentation and software platforms, increased accuracy, reproducibility and robustness are now part of efficient peptide mapping workflows [11]. While trypsin is commonly used for proteolytic digestion, it has been found to inefficiently digest VPs of AAV2, requiring alternative proteases, such as Lys C and Asp N, to ensure high sequence coverage [4]. Hence, extensive sample preparation is required, diminishing interest from fast paced pharmaceutical and biotechnological environments.
In this study, we present a semi-automated sample preparation workflow, reducing required hands-on time, while increasing robustness and reliability of generated data. We examined the performance of three proteases, namely, trypsin, chymotrypsin and pepsin, ensuring complete digestion and full sequence coverage of AAV5 capsid proteins in as little as 30 minutes. The chromatographic performance of both, analytical and nano scale LC, and their impact on capsid protein sequence coverage was compared. The resulting workflow demonstrates an easy and fast way to conduct peptide mapping of VP proteins and enabling analysis of PTMs which can ultimately support manufacturing processes and clinical trials of novel AAV-based gene therapy products.

Analytical instrumentation
LC-MS analysis was performed using either an UltiMate 3000 RSLCnano system or a Vanquish Flex UHPLC system coupled to a Q Exactive Plus Hybrid quadrupole-Orbitrap mass spectrometer (Thermo Scientific, Bremen, Germany) by means of an EASY-Spray source or by using an Ion Max API source equipped with a heated electrospray ionization (HESI-II) probe (Thermo Fisher Scientific, Bremen, Germany). All data were acquired using Thermo Scientific's Chromeleon Chromatography Data System software 7.2.9 (Thermo Scientific, Germering, Germany).

Sample preparation for peptide mapping
Four micrograms of AAV5 (4 × 10 11 vg/mL viral particles) were digested in triplicates using SMART Digest magnetic bead bulk trypsin, chymotrypsin, and pepsin kits. For each digestion, the sample, respective digestion buffer and 5mM TCEP were added to a 96-deepwell plate (Thermo Fisher Scientific, Vantaa, Finland). Samples were incubated for 30 min at 70ºC on medium mixing speed to prevent sedimentation of the beads using a Thermo Scientific KingFisher Duo Prime purification system with Thermo Scientific BindIt software version 4.0.
At the end of the incubation period, beads were removed, and the remaining volume of the sample was transferred to 1.5 mL Eppendorf protein LoBind tubes (Eppendorf, Dublin) followed by acidification with trifluoroacetic acid (TFA) at a final concentration of 0.1%.
Samples were evaporated to dryness and subsequently dissolved in 0.1% FA in water prior to LC-MS analysis.

LC-MS analysis
Resulting peptides were separated using a Thermo Scientific Easy-Spray PepMap RSLC C18 2µm column, 75µm x 50 cm and a Thermo Scientific Hypersil Gold RP, 3 µm, 1mm x 150 mm column (Thermo Fisher, Sunnyvale, CA, USA). Analysis was performed using a binary gradient of 0.1% (v/v) FA in water (A) and 0.1% (v/v) FA in acetonitrile (B). For analysis, 200 ng and 1µg of digested peptides were injected for each column, respectively.

PepMap RSLC C18 nano LC-MS experimental conditions
Gradient conditions for the Pepmap RSLC C18 were as follows: 2% B to 40% B in 105 min, increased to 60% B in 5 min, increased to 80% in 1 min followed by an isocratic hold for 9 min, followed by 2 linear gradients of 2% to 80% in 5 min, with an isocratic hold for 5 min.
Afterwards, the column was re-equilibrated to starting conditions for 20 min. The used flowrate was 250 nL/min and the column temperature was maintained at 45ºC. was done using a normalised collision energy set to 28 using a dynamic exclusion for 45 s and charge exclusion set to unassigned and > 8. MS instrumental tune parameters were set as follows: spray voltage was set to 1.7 kV, capillary temperature was 310ºC and S-lens RF voltage set to 50.

Hypersil Gold RP LC-MS experimental conditions
Gradient conditions for the Hypersil Gold RP were as follows: 2% B to 40% B in 105 min, increase to 60% B in 5 min, increase to 80% in 1 min followed by an isocratic hold for 14 min.
Column was re-equilibrated at 2% B for 10 min at flow rate of 100 µL/min and column temperature was maintained at 60ºC.
Data-dependent acquisition (DDA)-MS analysis of top 10 most abundant ions was performed in positive ion mode as above apart from the AGC target which was set to 3.0 × 10 6 and the dynamic exclusion time which was set to 7 s. MS instrumental tune parameters were set as follows: spray voltage was set to 3.8 kV, sheath gas flow rate was 25 arbitrary units (AU), auxiliary gas flow rate was 10 AU, capillary temperature was 320ºC, probe heater temperature was 150ºC and S-lens RF voltage set to 60.

Data processing
Peptide identification and PTM relative quantitation were performed using the Thermo Scientific BioPharma Finder software version 4.1 (Thermo Scientific, San Jose, CA, USA) using parameters summarised in Table S1. The results were filtered in two stages: 1) for sequence coverage evaluation and 2) for PTMs relative quantitation. The list of identified peptides was filtered to exclude nonspecific generated peptides, unknown modifications and gas phase ions. Peptide lists included all the peptides within ± 5 ppm accuracy, 1 × 10 5 average MS area and a confidence score ≥ 95%.
The sequence coverage percentage for each VP protein is reported for each experimental condition and a final list of evaluated PTMs with relative abundances ≥ 1% are reported for the pepsin digested samples.

Fast and efficient digestion of AAV capsid proteins.
In the present study, a sample preparation workflow for peptide mapping of AAV capsid proteins was optimized comparing different commonly used proteases, trypsin, chymotrypsin, and pepsin. Employing enzymes immobilised on magnetic beads allowed for automation, resulting in minimized hands-on time while yielding in excellent reproducibility in only 30 minutes.
An assessment of the SMART Digest trypsin kit following a peptide mapping protocol commonly used for monoclonal antibodies (mAbs) as reported by Millán Martín et al. [12] was conducted to gather preliminary results regarding digestion efficiency for AAV5. The results of this analysis showed insufficient digestion of AAVs with a sequence coverage below 50% (data not shown), clearly highlighting the need to improve sample preparation conditions. Furthermore, the generally low concentration of AAV samples led to the conclusion that higher sensitivity for LC-MS analysis would be beneficial. Hence, the use of nanoflow LC separation was explored.
Pepsin has low specificity with multiple cleavage sites; thus, offering a better chance to fully digest AAV capsid proteins. However, reproducibility is often noted as being problematic [13,14]. The Thermo SMART Digest kit aims at resolving this problem, by removing the protease from the sample thereby rapidly stopping the reaction in a controlled manner resulting in highly reproducible results. Figure 1 shows the base peak chromatogram (BPC) obtained from peptide mapping experiments using the PepMap RSLC C18 column, which highlights how pepsin was able to more efficiently digest AAV5, producing populated chromatograms showing a larger population of peptide peaks while also covering areas not addressed by either trypsin or chymotrypsin.
To ensure reliability of results, digestions were performed in triplicate. The total sequence coverage was evaluated for all three capsid proteins of AAV5 comparing the different proteases (Table 1). Our data demonstrates that pepsin provides a fast and efficient digestion of AAV5 capsid proteins, resulting in a sequence coverage of 100% for all three VPs in just 30 min of digestion (Table 1). This represents a considerable reduction of the required overnight incubation time reported in Rumachik et al. [15] without the need of additional sample preparation steps.
In comparison to pepsin, trypsin and chymotrypsin were not able to sufficiently digest the capsid proteins with sequence coverages of VP1 falling below 30.9 and 16%, respectively (Table 1 and Figure 2). Pepsin requires a much lower pH than the other two proteases to be active. Thus, the low pH used in the digestion buffer is potentially aiding capsid protein denaturation and unfolding which exposes the cleavage sites, ensuring complete digestion.   To ease visualization in BPF a filter of ≥ 1% minimum peptide recovery was applied reducing the number of peptides displayed in the sequence coverage map for pepsin ( Figure   S1). In BPF the recovery value displays the relative abundance of the modified peptide normalized against the peak areas of all peptides. Meaning higher-intensity peptides score more than lower-intensity peptides. Typically, a fair recovery is greater than 1%.
It is important to note that with this filter, sequence coverage was reduced to 96.5, 100 and 100% for VP1, VP2 and VP3, respectively, as two peptides in the N terminal of VP1 are only confirmed by peptides of less than 1% recovery ( Figure S1).

Quantitative analysis of PTMs in AAV5.
Relative levels of PTMs were studied for the pepsin digested and PepMap RSLC C18 resolved VP peptides only, as these were the only ones to achieve 100% sequence coverage.
PTM signatures were defined based on the differential (delta) mass values of modified versus unmodified peptides generated by pepsin digestion in any of the screened samples and identified automatically by BPF. Manual validation was performed in addition to software assignment to increase PTM identification confidence. Variable modifications were monitored as shown in Table S1. Importantly, using this method did not provide a specific capsid protein location for PTMs, as most of the VP1 sequence is shared with VP2 and VP3.
Relative levels of each PTM were studied and quantified using triplicate sample analysis.
After filtering the results as described in section 2.5, a total of 21 PTMs with at least 1% relative abundance were reported, with 5 sites identified as containing deamidations and 6 containing oxidations. In addition to deamidations and oxidations, 2 acetylation, 1 methylation, 2 isomerization, 1 amidation and 4 succinimide formation are reported with relative abundance values above 1% ( Table 2).
The top 10 most abundant modifications including S1 and S193 acetylation; N55, N92 and  To test the impact on chromatographic sensitivity and the possibility of using a more robust workflow for the identification of AAV peptides and PTMs, the same samples digested with pepsin were analysed using analytical flow LC separation (100 µL/min).
Thereby, sequence coverage was found to be reduced to 91.7, 91.3 and 100% for VP1, VP2 and VP3 respectively (Table 3) indicating a loss in sensitivity when using the higher flow rate, which also impacted PTM identification. Only 32 PTMs were identified using the Hypersil Gold RP even though 5 times more sample was injected for this analysis (Table S3).

Discussion
A quick and efficient peptide mapping protocol for AAV-based therapeutic products is key for informing upstream development processes regarding identifying amino acid sequence changes and PTMs, with the potential to affect product stability and efficiency.
The presented workflow demonstrates a significant improvement compared to previously reported sample preparation methods which may take up to 18 hours while also requiring larger amounts AAVs to produce satisfactory results [15]. Additionally, the number of manipulation steps required in traditional and more recent digestion protocols was also reduced [16]. By using a magnetic bead-based sample preparation for the digestion with pepsin, digestion is stopped in all samples at the exact same time, avoiding the production of different digestion patterns. Thereby, reproducibility is increased while also taking advantage of pepsin's lower specificity in comparison to trypsin, to produce quick results. Furthermore, pepsin has the advantage of being active at low pH. This aids the breakdown of the viral particles in association with the high temperatures used in the presented digestion protocol, helping to overcome the high thermal stability possessed by AAV5 that may deter protein proteolysis [17], ensuring digestion efficiency and high sequence coverage in a short amount of time.
Using the presented workflow resulted in 100% sequence coverage as well as the identification of 72 PTMs, 21 of each are confidently quantified with relative abundances above 1%. These represent quality attributes that can inform process development. While digestion with pepsin was carried out at low pH, which is known to reduce deamidation incidence [18], 2 sites (N55 and ~N571) had relative abundances above 10%. The high temperature (70ºC) used for the digestion could contribute to increasing deamidation levels.
However, a similar workflow was not seen to increase deamidation levels in mAbs [12].
Therefore, further investigations are necessary, to study the actual abundance of process induced deamidation in AAV samples.
Oxidation is also a PTM of interest to assess process induced variations. All 16 oxidation events detected had a relative abundance of below 2.5% except for M596. Oxidation can be a result of conditions encountered during purification, formulation and analysis as well as prolonged storage and frequent freeze-thaw cycles. Hence, the presence of oxidation constitutes a critical quality attribute, as oxidation can adversely impact the activity and stability of biotherapeutics [19].
Other PTMs like isomerization and succinimide formation are also of interest as they can alter protein structure and function, influencing binding and overall performance of the pharmacological product [20,21]. These processes can occur in vivo and in vitro and are influenced by environmental conditions. Thus, their monitoring is of pharmacological importance.
When it comes to AAV-based gene therapy products, limited information is available regarding PTM levels across different serotypes and the biological implication of such PTMs.
Employing intact protein analysis, Jin et al. [4] demonstrated that the first methionine of VP1 and 3 is cleaved off after synthesis, and the second amino acid of the sequence is subsequently acetylated. Using the presented peptide mapping workflow allowed for successful detection and relative quantitation of an acetylation on the suggested amino acid. Furthermore, Mary et al. [6] describes the detection of 9 different PTMs for AAV5 after tryptic digestion of AAV-293 cell generated viral capsids, including 3 ubiquination events, 1 phosphorylation, 1 SUMOylation and 4 glycosylation events. Our dataset confirms the identification of phosphorylation at S679 (S680 when the M at position 1 is considered), but the remaining modifications identified could not be confirmed using the present dataset. More recently, acetylation, deamidation, oxidation, methylation, and phosphorylation modifications were also identified in AAV5 after a 98% successful tryptic digestion [16]. General discrepancies on PTM abundance levels can be explained by differences in expression systems or overall downstream processing. Nevertheless, it is important to highlight that our workflow using pepsin digestion provides 100% sequence coverage in 30 min, giving a better insight into AAV5 sequence and associated PTMs.
In addition, the challenge posed by low concentrated AAV samples means that limited amounts of the biotherapeutic product are usually available for analysis. Thus, sensitivity becomes a critical parameter as demonstrated by our comparison between the use of analytical or nanoflow LC prior to MS detection. A trade off needs to be considered regarding sample availability and robustness, when deciding on what system to run the analysis on.
Moreover, pepsin poses its own challenges that need to be examined during data analysis. Due to its low specificity, applying a missed cleavage filter, which is commonly used for tryptic digests was deemed not appropriate. As smaller peptides are generated, an impact on peptide intensity spectra was also observed, which makes the data analysis challenging.
Thus, it was decided to only highlight PTMs found with at least 1% relative abundance.
However, a list of all identified PTMs is reported in table S2.

Conclusion
The results reported in the present study demonstrate a complete and efficient peptide mapping workflow to successfully monitor the amino acid sequence as well as PTMs of AAV capsid proteins (VPs). The aim of this work was to highlight the advantage of using pepsin over other more traditional proteases such as trypsin and chymotrypsin for peptide mapping analysis of AAVs. Additionally, it was shown that the use of nanoflow chromatography yielded superior sensitivity resulting in an increased sequence coverage as well as improved detection of low abundant PTMs.
Pepsin has been demonstrated to provide access to previously inaccessible areas of the AAV sequence generating fast and precise digestions in 30 minutes, that are easily controlled and stopped by removal of the enzyme at the end of the specified digestion time ensuring high reproducibility. We anticipate that our workflow will greatly benefit fast paced pharmaceutical environments that aim to quickly and completely characterize AAV based gene therapy products through peptide mapping analysis.   Table S1 BPF 4.1 software parameter settings for peptide mapping data analysis.

Figure S1
Pepsin sequence coverage map for AAV5 capsid proteins on the PepMap nano RSLC C18 column. The coloured bars show the identified peptides, with the numbers in the bars reflecting the retention time. The different colours indicate the peptide recovery in the MS1 scan: red >50%, orange >20% and yellow >10% represent good recovery. Green, >5%, light blue >2% and cyan >1% represent fair recovery.

Table S2
Complete list of PTMs identified, resolved on a PepMap RSLC C18 column and quantified for AAV5.

Table S3
Complete list of PTMs identified for AAV5 on Hypersil Gold RP column.

Figure S2
MS/MS spectrum with peak assignments and fragment coverage map for component (SFVDHPPDWLEEVGEGLRE) from Table S2 (S1+Acetylation from figure 3) using BioPharma Finder software.

Figure S3
MS/MS spectrum with peak assignments and fragment coverage map for component (SAGGGGPLGDNNQGADGVGNASGDWHCDSTWMGDRVVTKS TRT) from Table S2 (S193+Acetylation from figure 3) using BioPharma Finder software.     Figure S1 continued