DIALib: an automated ion library generator for data independent acquisition mass spectrometry analysis of peptides and glycopeptides

Data Independent Acquisition (DIA) Mass Spectrometry (MS) workflows allow unbiased measurement of all detectable peptides from complex proteomes, but require ion libraries for interrogation of peptides of interest. These DIA ion libraries can be theoretical or built from peptide identification data from Data Dependent Acquisition (DDA) MS workflows. However, DDA libraries derived from empirical data rely on confident peptide identification, which can be challenging for peptides carrying complex post-translational modifications. Here, we present DIALib, software to automate the construction of peptide and glycopeptide Data Independent Acquisition ion Libraries. We show that DIALib theoretical ion libraries can identify and measure diverse N- and O-glycopeptides from yeast and mammalian glycoproteins without prior knowledge of the glycan structures present. We also show that DIALib libraries consisting only of glycan oxonium ions can quickly and easily provide a global glycosylation profile of the “oxoniome” of glycoproteomes. DIALib also enables the study of post-translational modifications beyond glycosylation, in isolation or combination, in complex proteomes from diverse biological and clinical samples.


Introduction
Key challenges in bottom-up mass spectrometry (MS) experiments are peptide identification and quantification. The wide range of protein abundance in complex samples and the presence of post-translational modifications reduce the efficiency of both peptide identification and quantification. One of the most common post-translational modifications in proteins is glycosylation [1,2]. Protein glycosylation is also particularly difficult to analyse by MS, due to the natural site-specific heterogeneity in glycan structure and occupancy [3]. Site specific variation in the structure or occupancy of glycans can lead to major changes in glycoprotein stability, folding, and function in physiological, pathological, and biotechnological situations [3][4][5][6][7][8][9][10]. Driven by the importance of glycans in biology, substantial progress has been made in analytical LC-MS/MS methods to identify and quantify glycan heterogeneity in complex samples [11,12].
Data Independent Acquisition (DIA) is a major recent development in mass spectrometry proteomics. In DIA, the decision to fragment a particular precursor is made not based on the precursor's intensity, but rather on its presence within predefined m/z windows [13,14]. One implementation of DIA is sequential window acquisition of all theoretical mass spectra (SWATH) [13]. DIA workflows are powerful for unbiased measurement of the abundance of all detectable peptides from complex proteomes. However, DIA workflows depend on the construction of ion libraries that allow measurement of specific precursor and fragment ion m/z pairs (a.k.a. transitions) with specific RTs. These DIA ion libraries are routinely built from peptide identification data obtained through Data Dependent Acquisition (DDA) workflows. The main limitation of these empirical DDA-derived libraries is that the peptides identified passed the intensity cut-off requirements for DDA fragmentation and could be identified using database searching software. These two requirements limit the potential of DIA proteomics, especially for investigation of post-translationally modified peptides. One strategy to circumvent this limitation is to build theoretical ion libraries that include all the desired transitions that would result from fragmentation of peptides modified with post-translational modifications of interest.
We previously used a SWATH DIA approach to measure relative differences in glycan occupancy and structure in yeast cell wall proteins as a consequence of defects in the Nglycosylation machinery [5]. This approach used a manually constructed ion library composed of b and y peptide ions and glycan oxonium ions. We observed that while all mutants showed distinct and significant differences in glycan occupancy compared to the wild-type strain, the mannosyltransferase mutants (alg3Δ, alg9Δ, and alg12Δ) also displayed significant changes in glycan structure due to defects in the addition of mannose residues to the glycan branches [5].
However, because the manually curated library depended on our ability to manually identify glycopeptides with multiple glycoforms in the DDA data, the ion library generated most likely underrepresented the diversity and heterogeneity of the yeast cell wall glycoproteome.
An alternative way to obtain a more comprehensive representation of glycoproteome diversity is to measure glycopeptides using Y ions present in MS/MS fragmentation spectra. Y ions are high intensity ions produced during glycopeptide fragmentation with CID/HCD, and consist of the entire precursor peptide sequence with 0, 1, or more monosaccharide residues attached to the glycosylation site (termed Y0, Y1, Y2, etc) [15]. Y ions are a common fragment ion to all glycoforms of the same glycopeptide, independent of glycan structure, and are thus an ideal fragment ion to use to detect and measure glycan structural heterogeneity without prior information on glycan structures or glycopeptide precursor m/z [15]. This approach has been shown to be effective in identifying common and rare glycoforms in a glycopeptide-enriched serum sample [15].
The manual construction of spectral libraries can be tedious and time consuming. Here, we present DIALib, software to automate the construction of peptide and glycopeptide Data Independent Acquisition ion Libraries, for use in DIA analyses with Peakview (SCIEX). We show that DIALib theoretical Y libraries can identify and measure N-and O-glycopeptides with and without prior knowledge of the glycoforms present and of their retention times, and we discuss the utility of complementing Y libraries with b and y peptide fragment ions. We also show that DIALib libraries consisting only of glycan oxonium ions can quickly and easily provide a global glycosylation profile of DIA samples.

Library construction
DIALib is software that can generate customized ion libraries to use in Peakview (SCIEX).
https://github.com/bschulzlab/DIALib. Briefly, the software processes the input protein or peptide sequences to generate ion libraries containing peptide b and y ions, glycopeptide Y ions, and/or oxonium ions of choice, and allows the user to select the retention times (RT) to be used for the peptide(s) and precursor masses. Key features of the software are described in detail below.

Input data
The input data for the software is one or more protein or peptide sequences in FASTA format, or Uniprot identifiers (www.uniprot.org). The fasta file or text input is first processed to retrieve protein identification and amino acid sequence. DIALib can perform in silico protease digestion of the input sequences, allowing selection of specific peptides of interest for downstream processing and inclusion in the library. The user can select from a range of default settings or input a customized setting of choice (Table 1), including: a) Modifications (static (e.g. propionamide or carbamidomethyl), variable Y-type (e.g. Y0, Y1, and/or Y2 ions for Hex or HexNAc), and variable non-Y-type (e.g. phosphorylation, sulfation, and carboxylation, among others); b) RT (unique or multiple selection); c) m/z windows, corresponding to the Q1 value (normally, the precursor m/z values) (unique or multiple selection); d) Maximum ion fragment charge; e) Oxonium ions (unique or multiple selection); and f) Fragment ion types, corresponding to the Q3 values (normally, the fragment ion m/z values) (b, y, or Y). Finally, the user can choose to generate a library that contains multiple RT, or multiple m/z windows, or both.
RT selection: the user can select specific RT (if prior RT information is available) or a range of RTs (if no prior RT information is available). Q1 selection: DIALib allows the user to input specific Q1 values or to select a range of m/z windows in which specific transitions will be measured. Multiple Q1s are especially important when no prior information of Q1 is available, or when multiple Q1 for the same peptide are expected due to the presence of PTMs with unpredictable m/z (e.g. glycosylation). If multiple Q1s are selected, the value of Q1 that appears in the library is the middle value of the chosen m/z range (e.g. if the m/z window chosen is 400-425 m/z, the ion library will show a Q1 value of 412.5).
Q3 selection: Libraries made of b and y ions (by libraries) do not contain the b1, y1, and full length b ion. It is possible to suggest a stop position within the peptide to limit the b or y series, and in this way b and y transitions are only generated until the stop position. It is also possible to select specific b and y transitions for each peptide. The type and number of transitions used in the library can be modified for each peptide. In the case in which the library contains b, y, and either Y ions or variable post-translational modifications, there is an option to generate b and y transitions that correspond to either the modified or unmodified query peptide (i.e. they contain or not the mass of the post-translational modification that is added to the Y ion series).

Library generation
Based on input data and user defined settings, DIALib generates a combination of query peptides with modified sites and single or multiple RTs. The ion library is generated as a text file and contains all the parameters that Peakview requires, including parameters that are fixed for all transitions: relative intensity (set to 1), score (set to 0), prec_y (set to 0), confidence (set to 0.99), and shared (set to FALSE) ( Table 1).

Preparation of human serum-derived Immunoglobulin G
Purified human IgG (I4506, Sigma) was prepared essentially as previously described [16,17]. 5 µg of IgGs were denatured and reduced in a buffer containing 6 M guanidine hydrochloride, 50 mM Tris pH 8, and 10 mM dithiothreitol (DTT) for 30 minutes at 30 ˚C while shaking at 1500 rpm in a MS100 Thermoshaker incubator (LabGene Scientific Ltd.). Reduced cysteines were alkylated by addition of acrylamide to a final concentration of 25 mM followed by 1 h incubation at 30 ˚C in a thermoshaker at 1500 rpm, and excess acrylamide was then quenched by addition of DTT to a final additional concentration of 5 mM. Proteins were precipitated by addition of 4 volumes of 1:1 methanol/acetone and incubation for 16 h at -20 ˚C. After centrifugation at 18,000 rcf at room temperature for 10 min, the supernatant was discarded and the precipitated proteins were resuspended in a 50 mM ammonium bicarbonate buffer. Proteins were digested overnight with trypsin (T6567, Sigma) at 37 ˚C in a thermoshaker at 1500 rpm. Peptides were desalted by ziptipping with C18 Ziptips (ZTC18S960, Millipore).

Mass Spectrometry analysis
Desalted peptides were analyzed by liquid chromatography electrospray ionization tandem mass spectrometry (LC-ESI-MS/MS) using a Prominence nanoLC system (Shimadzu) and a TripleTof 5600 mass spectrometer with a Nanospray III interface (SCIEX) essentially as described [5,18].
Gas and voltage settings were adjusted as required. MS TOF scan across 350-1800 m/z was performed for 0.5 sec followed by data dependent acquisition (DDA) of up to 20 peptides with intensity greater than 100 counts, across 100-1800 m/z (0.05 sec per spectra) using collision energy (CE) of 40 +/-15 V. For data independent (DIA) analyses, MS scans across 350-1800 m/z were performed (0.5 sec), followed by high sensitivity DIA mode with MS/MS scans across 50-1800 m/z, using 34 isolation windows of width 26 m/z, overlapping by 1 m/z, for 0.1 sec, across 400-1250 m/z. CE values for SWATH samples were automatically assigned by Analyst software (SCIEX) based on m/z mass windows.

Data analysis
The ion libraries generated using DIALib were used to measure the abundance of glycopeptides with different glycan structures using Peakview (v2.2, SCIEX). The Peakview processing settings employed were, unless specifically noted, as follows: all peptides were imported, all transitions/peptide were used for quantification, the confidence threshold was 99%, the False Discovery Rate (FDR) was 1% (except for the oxoniome quantitation, in which 100% FDR was used), and the error allowed was 75 ppm. The width of the XIC RT window varied depending on the analysis performed. For identifying and quantifying N-glycoforms in the yeast cell wall samples [5], an XIC RT window of 25 min, which covered the entire LC gradient, was used with the Y ion library, and an XIC RT window of 2 min was used with the optimized RT by ion library. For identifying and quantifying N-glycoforms in human IgG samples and O-glycoforms in the yeast cell wall samples [5], an XIC RT window of 1 or 2 min was used, respectively. All data was exported from PeakView allowing 100% FDR. To measure site-specific glycan structure, data was first normalized as follows. The signal intensity measurement for all transitions from a given glycopeptide at a particular Q1 and RT was divided by the summed abundance of all detected signal of that glycopeptide for all Q1s at all RTs. The normalized signal intensities obtained from multiple RT and multiple Q1 data was re-arranged using an inhouse developed script called reformat_peptide3.py (Supplementary Information) in which the observed RT was rounded up to the nearest integer and then the data was arranged in a matrix of RT vs m/z window for each peptide. The yeast cell wall MS analyses were performed as biological triplicates [5], while the IgG sample was n=1. Heatmaps were prepared in Prism v7.00 (GraphPad Software, La Jolla California USA). Statistical analyses were performed using twotailed t test in Excel (Microsoft).
Byonic (Protein Metrics v2.13.17) was used to identify N-and O-glycopeptides in the human IgG sample and in the yeast cell wall samples. For both searches, Preview (Protein Metrics v2.13.17) was first used to obtain an estimate of the precursor and fragment m/z errors in the MS files, a key parameter for subsequent searches in Byonic. The searches in Byonic for N-glycans in human IgG were performed with the following settings: Protein database: Homo sapiens proteome UP000005640 downloaded from Uniprot on April 20 th 2018 (with 20,303 reviewed proteins) with decoys and common contaminants added; Digestion specificity was set to Fully specific and no misscleavages were allowed (to only identify glycoforms of the fully cleaved tryptic version of the IgGs, mimicking the Y ion library); and Precursor and Fragment mass tolerance were set to 20 and 30 ppm, respectively. Propionamide was set as fixed modification.
The only modification included was a modified version of the Byonic database of 57 N-glycans at the consensus sequence N-X-S/T containing 8 additional glycoforms, set as "Common 1".
Only 1 common modification was allowed per peptide. To identify N-and O-glycans in the cell wall sample, we searched one DDA sample of the wild-type yeast. Byonic searches were performed with the following settings: 1) Protein database: Saccharomyces cerevisiae proteome UP000002311 downloaded from Uniprot on April 20 th 2018 (with 6049 reviewed proteins) with decoys added; Digestion specificity was set to Fully specific and one misscleavage was allowed; Precursor and Fragment mass tolerance were set to 40 ppm and 1 Da, respectively. Propionamide was set as a fixed modification. The modifications allowed included a homemade database of yeast N-glycans at the consensus sequence N-X-S/T, set as "Common 1", or O-glycans at S/T set as "Rare 1". The N-glycan database contained structures with 2 HexNAc and 1 to 15 Hex, and the O-glycan database contained structures with 1-12 Hexoses. Only 1 common and 1 rare modifications were allowed per peptide.

Automated construction of theoretical peptide libraries
Measuring the abundance of peptides using DIA approaches requires carefully crafted ion libraries. These libraries contain five key elements: the mass/charge (m/z) of the precursor ion (Q1), the m/z of the fragment ions (Q3), the RT of the precursor ion, the protein name, and the amino acid sequence of the precursor ion (Table 1). Each Q1/Q3 pair (transition) at a specific RT should be unique and can be used for measurement of a specific peptide. Ion libraries for SWATH are typically constructed from protein identifications based on DDA data using database search software such as ProteinPilot. While this approach for generating ion libraries is efficient and convenient, these libraries will not contain peptides that were not identified by DDA analysis, such as low abundance peptides or those with PTMs not included in the search parameters. Therefore, libraries generated by this approach only include a subset of the potentially detectable peptides in a SWATH experiment. This limitation is especially severe for glycopeptides, because glycans are structurally heterogeneous and difficult to identify with standard proteomic database search methods [11,12].
To overcome the limitation of DDA-driven libraries, tailored libraries can be constructed that include the desired Q1/Q3 pairs and corresponding RTs. Values for specific Q1/Q3 pairs and RTs can be obtained from manual exploration of the raw MS data, as we have previously described [5]. However, manual exploration of raw data and manual library construction is time consuming and error prone. Alternatively, libraries can be constructed with theoretical Q1/Q3 values. Here, we present DIALib, software that expedites and automates library construction for theoretical peptides and glycopeptides for use in DIA analysis ( Figure 1).

Figure 1. DIALib generates customized ion libraries for interrogation of post-translational modifications including protein glycosylation in Data Independent Acquisition MS data.
The user inputs/selects the key parameters for construction of the ion library: amino acid sequence (or Uniprot ID), desired fixed and variable post-translational modifications, desired type of fragment ions (b, y, Y, or oxonium), retention time, and Q1. DIALib calculates the corresponding Q3 values and generates an ion library text file formatted for PeakView (SCIEX). If Y ions (e.g. HexNAc) are selected, DIALib searches for Asparagine (N) residues in the Theoretical Ion Library User Input

Y-ion transitions b-and y-ion transitions
context of a sequon (e.g. NST, N in yellow) within the peptide sequences input, and constructs the Y series (Y0, no added sugars; Y1, with one HexNAc (blue square); Y2, with two HexNAc).
DIALib allows facile construction of SWATH ion libraries, including selection of pre-defined theoretical values and input of user-customized values for variables, including static modifications (alkylation of cysteine residues), non-glycan PTMs, Q1 (m/z windows), RT, z, and Q3 (b, y, Y, and/or oxonium ions) ( Table 1). One important feature of DIALib is the Q1 selection tool. During a SWATH experiment, the precursor m/z spectral space is divided into windows that can have the same or variable m/z size, typically with 1 m/z overlap between adjacent windows. DIALib allows the user to input arbitrary windows corresponding to any actual DIA experiment or to select a default fixed window method. Similarly, the user can input arbitrary customized RT values, or select one or more integer RT values from 1 to 60 min. With

DIALib theoretical Y ion libraries can measure N-glycopeptides in a complex sample
We first tested if a Y ion library constructed with DIALib could reproduce results obtained with a manually curated glycopeptide library. To do this, we chose our previously published yeast cell wall glycoproteomics dataset, in which we had measured site-specific glycan structure and occupancy in yeast cell wall glycoproteins using a manually curated ion library of b, y, and oxonium ions [5]. We used DIALib to construct a Y ion library for the eight glycopeptides that were characterized in detail in this previous work [5]. We initially took a stringent theoretical approach, and made no assumptions about the m/z or RT of detectable versions of these glycopeptides. The Y ion library contained a maximum of 6 transitions (Y0, Y1, and Y2, and z = 1 or 2) for each of the 34 Q1s or m/z windows, and a fixed RT for all glycopeptides (Supplementary Tables S1 and S3). Using the DIALib Y ion library, we re-interrogated the published SWATH dataset [5] and measured the abundance of each glycopeptide in each m/z window over the full LC elution timeframe. We exported the data allowing 100% FDR, to provide a complete data matrix (Supplementary Table S4). To facilitate comparison between yeast strains, we displayed the data as heatmaps (   that were statistically significantly different between wild type and mutant yeast strains (P < 0.05) (Figure 3). The measurements obtained with the manually curated library were robust, and identified many statistically significant differences in glycoform abundance between the wild type yeast and the glycosylation mutants, especially in the mannosyltransferase mutants (alg3Δ, Thus, for some glycopeptides, the Y ion measurements were robust, while for another glycopeptide the higher variability between samples led to identification of fewer statistically significant differences between the wild type and mutant yeast strains ( Figure 3). Together, this suggested that in the absence of DDA data on peptide fragmentation and RT, DIALib's Y ion library is an excellent tool to obtain an overview of the different glycoforms in a sample, but may not by itself be a robust method to search for statistically significant differences in glycoform abundance between complex samples.   Figure 2G-I and Supplementary Figure S1 Table S5). Excluding fragment ions containing the asparagine residue in the stopby libraries ensured that only transitions common to the unglycosylated and glycosylated forms of the same peptide were present in the library. The poor performance in glycoform measurement of the by/stopby libraries compared to the Y ion libraries was unexpected. We considered two possibilities that might explain the inaccurate peak picking that was the underlying cause of this poor performance: that the by/stopby libraries used were not analytically optimal, as they contained all (or most) theoretical by/stopby fragment ions for each glycopeptide; or that the RT window used was too wide, as it covered the entire LC gradient. Indeed, we observed that reducing the number of transitions in the by library while maintaining a large XIC RT window tended to enhance peak peaking (data not shown).
However, the optimal number of transitions required for optimal peak picking was peptide-dependent (data not shown), and thus could not be set as a fixed parameter for all glycopeptides.
Next, we tested if narrowing the XIC RT window would improve the performance of the by, stopby, stopby+Y, and Y libraries, while maintaining all theoretical transitions in the library.
Since a narrow XIC RT window requires an accurate RT for each peptide, we constructed by, stopby, stopby+Y, and Y libraries using the manually optimized RT from the published library  2 and 4). In general, the heatmaps obtained using the stopby library had lower background than when the by library was used (i.e. no measurements where no peptides were expected), especially when an optimized RT was employed (Figure 4 and Supplementary Figures   2 and 3). The improvement in peak picking for the RT-optimized libraries was also demonstrated by the higher R 2 correlation scores obtained when comparing site-specific glycan structural heterogeneity between the RT-optimized libraries and the manually curated library ( Figure 4B).
Of all the libraries used in this study, the RT-optimized stopby+Y library was the best performing, not only because it consistently displayed one of the highest overall R 2 correlation scores for all 8 glycopeptides (0.84 +/-0.14, Figure 4C), but also because it could detect and measure both glycan occupancy and structure simultaneously (Figures 2-4

and Supplementary
Figures S1 and S2). Nevertheless, it is important to point out that in the absence of prior RT information, the Y ion library still considerably outperformed all other libraries tested in this work (Figure 2 and Supplementary Figure S2). Overall, these results indicated that in order for the DIALib libraries to optimally detect and measure glycoforms in a complex sample, prior optimization of the RT for each peptide and a narrow XIC window are required.   Table S8). This approach also allowed measurement of glycoforms with m/z values that fell in the same m/z window, but whose glycan structures altered the glycopeptide RT. To process the data, we first normalized the signal intensities of each Q1/RT pair to the total signal intensity for the sample (all Q1 at all RT). We organized the normalized data as a RT vs m/z window matrix ( Figure 5). To simplify the matrix due to differences in the observed RT at different Q1s, we rounded up the RTs for each measurement to the nearest integer (using an in house designed script reformat_peptide3.py) ( Figure 5 and Supplementary information). Figure 5 shows heatmaps summarizing the results for the IgG glycopeptides. The majority of the signal for IgG1 was concentrated at RT 7 min in multiple Q1s, suggesting that multiple glycoforms of the same IgG1 glycopeptide eluted at 7 min. Similarly, most glycoforms of IgG2 eluted at 8 min ( Figure 5). As expected, the Y ion glycan profile for IgG3 and IgG4 was identical because their glycopeptides have isobaric sequences ( Figure 5). We validated this  Table S9). These results highlight the potential improved sensitivity of DIA compared with DDA analyses. Together, the data demonstrated that the DIALib Y ion library can successfully profile, detect, and measure mammalian N-glycopeptides.

Figure 5. Glycoprofiling human Immunoglobulin G using a DIALib Y ion library.
Glycopeptide profiles across all retention times and m/z windows for each IgG isotype.

RT (min)
and S11, respectively). We constructed a DIALib library focusing on one O-glycopeptide:  Table S13). As described above for the IgG analysis, we arranged the normalized data into an RT vs m/z window matrix for each yeast strain. Figure 6 shows an example of the RT vs Q1 matrix for the WT strain, although the heatmaps for all the strains were very similar ( Figure 6, top panel). The four glycoforms of the   5  437  462  487  512  537  562  587  612  637  662  687  712  737  762  787  812  837  862  887  912  937  962  987  1012  1037  1062  1087  1112  1137  1162  1187  1212  1237   WT  ost3Δ  ost5Δ  ost6Δ  alg6Δ  alg8Δ  die2Δ  alg3Δ  alg9Δ Table S14). DIALib includes a standard catalogue with additional oxonium ions such as from sialic acids, which we did not include in this analysis of the yeast cell wall glycoproteome. We used this library to profile cell wall samples from wild type yeast and the nine glycosylation mutants. As above, we searched the entire LC gradient and m/z window range using an XIC RT window of 2 min (Supplementary Table S15). To profile the overall oxoniome, we summed the signal intensity of all eight oxonium ions in each RT and m/z window (30 retention times x 34 m/z windows = 1020 data points per strain) and then normalized this value to the total oxonium signal for each strain (sum of the 1020 data points per strain). The data was rearranged to depict a heatmap of RT vs m/z windows (Figure 7). Figure 7B shows the results of oxonium ion profiling as heatmaps for wild type yeast, and the ost3Δ and alg3Δ mutants. Most of the signal detected was concentrated in high m/z windows between 5-20 min, as expected for typical m/z and elution characteristics of glycopeptides ( Figure 7B). Ost3 is a subunit of the OTase, and the ost3∆ mutant therefore transfers the same glycan to proteins as wild type, but at lower efficiency [5]. This is reflected in the relatively minor differences in oxoniome profile between wild type and ost3∆ cells. A useful way to depict differences between strains is the use of substraction heatmaps, in which each data point is the normalised intensity measured for wild-type less the normalised intensity measured for a given strain ( Figure 7C). In agreement with the data obtained using the site-specific glycopeptide ion libraries above (Figures 1-3) and [5], we observed that mutant strains that had small effects on glycosylation occupancy and glycan structure also displayed small differences to wild type yeast in their oxonium substraction maps (e.g. the die2Δ and ost5Δ strains, Figure 7C). Also as before, we observed the mannosyltransferase mutants alg3Δ, alg9Δ, and alg12Δ, and the OST mutant ost3Δ displayed the highest difference compared with wild type ( Figure 7C). We could also use oxonium ion data to profile the overall monosaccharide composition of the glycoproteome in each yeast strain.
Consistent with the expected high mannose N-glycans and O-glycans in yeast glycoproteins, we measured higher signal for Hex than for HexNAc oxonium ions ( Figure 7D). Strains lacking the mannosyltransferases Alg3, Alg9, and Alg12 transfer truncated N-glycan to protein (GlcNAc(2)Man(5), GlcNAc(2)Man(6), or GlcNAc(2)Man(8), respectively) (Figures 1-3) and [5]. We therefore predicted that oxoniom profiling would detect a different ratio of Hex:HexNAc in these strains compared to wild type yeast. Indeed, we observed statistically significant differences in the proportion of the corresponding oxonium ions between wild type and the alg3Δ and alg9Δ strains ( Figure 7D). In these strains, the relative abundance of Hex oxonium ions was significantly lower and of HexNAc oxonium ions significantly higher than in wild type (P < 0.05, Figure 7D). Together, profiling the oxoniome with DIALib libraries was able to detect the expected differences in the glycoproteomes of yeast glycan biosynthesis mutants, demonstrating that this approach provides a rapid and robust method of profiling glycosylation differences between samples. However, such optimisation is peptide-and sample-specific, and requires expertise and careful manual implementation.
Beyond glycopeptides, the utility provided by DIALib also enables the study of other PTMs, in isolation or combination, in complex proteomes from diverse biological and clinical samples.  C  o  n  s  o  r  t  i  u  m  o  f  G  l  y  c  o  b  i  o  l  o  g  y  E  d  i  t  o  r  s  ,  L  a  J  o  l  l  a  ,  C  a  l  i  f  o  r  n  i  a  .  A  l  l  r  i  g  h  t  s  r  e  s  e  r  v  e  d  .  :  C  o  l  d  S  p  r  i  n  g  H  a  r  b  o  r  (  N  Y  )  .  1  2  .  T  h  a  y  s  e  n  -A  n  d  e  r  s  e  n  ,  M  .  ,  N  .  H  .  P  a  c  k  e  r  ,  a  n  d  B  .  L  .  S  c  h  u  l  z  ,   M  a  t  u  r  i  n  g  G  l  y  c  o  p  r  o  t  e  o  m  i  c  s  T  e  c  h  n  o  l  o  g  i  e  s  P  r  o  v  i  d  e  U  n  i  q  u  e  S  t  r  u  c  t  u  r  a  l  I  n  s  i  g  h  t  s  i  n  t  o  t  h  e  N  -g  l  y  c  o  p  r  o  t  e  o  m  e  a  n  d  I  t

Supplementary Material
Supplementary Script S1. reformat_peptide3.py. Round up RT and rearrange the datatable into RT vs windows matrix. Supplementary Table S1. Example of details from a DIALib Y ion library formatted for use in Peakview.

Supplementary Figures
Supplementary Figure S1. DIALib Y ion measurement of site-specific glycan structural heterogeneity for eight cell wall yeast glycopeptides. Abundance of glycoforms of glycopeptides containing Gas1 N 40 GS, Gas1 N 95 TT, Gas1 N 253 LS, Gas3 N 350 VS, Crh1 N 177 YT, Gas1 N 57 ET, Ecm33 N 304 FS, and Gas3 N 269 ST measured using a DIALib Y ion library or a manually curated ion library [5] in wild type yeast and yeast with mutations in the N-glycan