Evaluation of RNAlater™ as a field-compatible preservation method for metaproteomic analyses of bacteria-animal symbioses

Field studies are central to environmental microbiology and microbial ecology as they enable studies of natural microbial communities. Metaproteomics, the study of protein abundances in microbial communities, allows to study these communities ‘in situ’ which requires protein preservation directly in the field as protein abundance patterns can change rapidly after sampling. Ideally, a protein preservative for field deployment works rapidly and preserves the whole proteome, is stable in long-term storage, is non-hazardous and easy to transport, and is available at low cost. Although these requirements might be met by several protein preservatives, an assessment of their suitability in field conditions when targeted for metaproteomics is currently lacking. Here, we compared the protein preservation performance of flash freezing and the preservation solution RNAlater™ using the marine gutless oligochaete Olavius algarvensis and its symbiotic microbes as a test case. In addition, we evaluated long-term RNAlater™ storage after 1 day, 1 week and 4 weeks at room temperature (22-23 °C). We evaluated protein preservation using one dimensional liquid chromatography tandem mass spectrometry (1D-LC-MS/MS). We found that RNAlater™ and flash freezing preserved proteins equally well in terms of total number of identified proteins or relative abundances of individual proteins and none of the test time points were altered compared to t0. Moreover, we did not find biases against specific taxonomic groups or proteins with particular biochemical properties. Based on our metaproteomics data and the logistical requirements for field deployment we recommend RNAlater™ for protein preservation of field-collected samples when targeted for metaproteomcis. Importance Metaproteomics, the large-scale identification and quantification of proteins from microbial communities, provides direct insights into the phenotypes of microorganisms on the molecular level. To ensure the integrity of the metaproteomic data, samples need to be preserved immediately after sampling to avoid changes in protein abundance patterns. In laboratory set-ups samples for proteomic analyses are most commonly preserved by flash freezing; however, liquid nitrogen or dry ice is often unavailable at remote field locations due to its hazardous nature and transport restrictions. Our study shows that RNAlater™ can serve as a low hazard, easy to transport alternative to flash freezing for field preservation of samples for metaproteomics. We show that RNAlater™ preserves the metaproteome equally well as compared to flash freezing and protein abundance patterns remain stable during long-term storage for at least 4 weeks at room temperature.

conditions when targeted for metaproteomics is currently lacking. Here, we compared the protein 23 preservation performance of flash freezing and the preservation solution RNAlater™ using the 24 marine gutless oligochaete Olavius algarvensis and its symbiotic microbes as a test case. In addition, we 25 evaluated long-term RNAlater™ storage after 1 day, 1 week and 4 weeks at room temperature (22-23 26 °C). We evaluated protein preservation using one dimensional liquid chromatography tandem mass 27 spectrometry (1D-LC-MS/MS). We found that RNAlater™ and flash freezing preserved proteins 28 equally well in terms of total number of identified proteins or relative abundances of individual 29 proteins and none of the test time points were altered compared to t0. Moreover, we did not find 30 biases against specific taxonomic groups or proteins with particular biochemical properties. Based 31 on our metaproteomics data and the logistical requirements for field deployment we recommend 32 RNAlater™ for protein preservation of field-collected samples when targeted for metaproteomcis. 33

45
Field studies are central to environmental microbiology and microbial ecology as they allow for the 46 in situ study of microbial communities and their interactions with the biotic and abiotic environment. 47 9 µl of IAA solution (0.05 M iodoacetamide in UA solution) and then incubated samples at 22 °C for 156 20 min in the dark. We removed the IAA solution by centrifugation followed by three wash steps 157 with 100 µl of UA solution. Subsequently, we washed the filters three times with 100 µl of ABC 158 buffer (50 mM ammonium bicarbonate). We added 1.6 μg of Pierce MS grade trypsin (Thermo 159 Fisher Scientific) in 40 µl of ABC buffer to each filter. Filters were incubated overnight in a wet 160 chamber at 37°C. The next day, we eluted the peptides by centrifugation at 14,000 x g for 20 min 161 followed by the addition of 50 µl of 0.5 M NaCl and another centrifugation step. Peptides were 162 quantified using the Pierce MicroBCA Kit (Thermo Fisher Scientific) following the instructions of 163 the manufacturer. 164 We processed samples of the preservation method replication and the RNAlater™ time series similar 165 to the preservation method comparison samples with the following modifications: We added 60 µl 166 of SDT-lysis buffer instead of 50 µl and boiled samples at 95°C for 10 min. To minimize sample 167 loss, we did not do the 5 minute centrifugation step at 21,000 x g described in the original protocol 168 [43] and instead mixed the complete 60 µl of each lysate with 400 µl of UA solution in a 10 kDa 169 MWCO 500 µl centrifugal filter unit. All subsequent steps were identical to the sample preparation 170 for the preservation method comparison with the exception that we added 0.62 μg and 0.54 μg of 171 Pierce MS grade trypsin (Thermo Fisher Scientific) in 40 µl of ABC buffer to each filter for the 172 repetition of the preservation method comparison and the RNAlater™ time series respectively. 173

175
All samples were analyzed by 1D-LC-MS/MS. Detailed instrument set-ups, gradients and methods 176 are specified in Additional file S1. In brief, all samples were loaded onto a C18 Acclaim PepMap 100 177 pre-column and separated on an Easy-Spray PepMap C18, 75µm x 75 cm analytical column 178 (Thermo Fisher Scientific) using reverse phase liquid chromatography. Eluting peptides were ionized 179 with electrospray ionization and mass spectra were acquired using a data dependent acquisition 180 method in a Q-Exactive Orbitrap mass spectrometer (Thermo Fisher Scientific). 181 inferred proteins based on the results from a search against a target-decoy database. Proteins with a 197 q-value of <0.01 were categorized as high-confidence identifications and proteins with a q-value of 198 0.01-0.05 were categorized as medium-confidence identifications. We combined search results for 199 all samples into a multiconsensus report in Proteome Discoverer and only proteins identified with 200 medium or high confidence were retained, resulting in an overall protein-level FDR of 5%. For 201 protein quantification, normalized spectral abundance factors (NSAFs,[46]) were calculated per 202 species and multiplied by 100, to give the relative protein abundance in %. 203

Protein identification and quantification
Outlier identification and removal 204 We classified samples as outlier if at least two out of the following criteria were met i) the total ion 205 chromatogram intensity was below 1x10 9 ; ii) the proportional number of standard deviations above 206 and below the mean (z-score) of the number of identified proteins (filtered for 5% FDR) was > ∓ 1; 207 iii) the number of identified proteins (filtered for 5% FDR) was more than one standard deviation 208 below the mean number of identified proteins of all samples within a group (Additional file S2). In 209 addition, we also applied the Generalized Extreme Studentized deviate test (ESD) (significance level 210 of 0.5, maximum of 10 outliers) on the number identified proteins in the RNAlater™ time series for 211 outlier identification. This procedure was not applied on the preservation method comparison and 212 the replication of the preservation method comparison due to insufficient number of replicates. In 213 total, we identified 2 samples of the preservation methods comparison, 2 samples of the repetition 214 of the methods comparisons and 8 samples of the RNAlater™ time series as outliers (Additional file 215 S2). Identified outliers were excluded from all subsequent analyses. 216 In addition, we checked the metaproteomes for evidence of accidental sampling of the co-occurring 217 marine gutless oligochaete Olavius ilvae [47]. O. ilvae cannot be easily distinguished from O. algarvensis 218 during sampling as O. algarvensis and O. ilvae are highly similar in size, shape and color. However, they 219 harbor distinct symbionts, which can be used to distinguish between the species [48]. To test 220 whether any of our samples was a specimen of O. ilvae, we created a custom database including 221 protein sequences of the α7-symbiont, Cand. Thiosymbion sp., γ3-symbiont and δ3-symbiont of O. 222 ilvae. In addition, we also included protein sequences of Cand. Thiosymbion algarvensis and the δ1-223 symbiont of O. algarvensis for testing. The database was then loaded into Proteome discoverer and 224 proteins were identified as described above. One sample of the RNAlater™ time series was 225 identified as O. ilvae and therefore removed as an outlier (Additional file S2). 226

227
To determine which identified proteins were shared by all samples or unique to specific 228 treatments/time points we loaded the 5% FDR filtered PSM multiconsensus files into Perseus 229 1.6.5.0 (Tyanova et al., 2016), filtered out proteins that did not have at least 75% valid values (greater 230 than 0) in at least one group and log2 transformed the data. We then calculated the overlap protein 231 sets with the numerical Venn function in Perseus and visualized the results with a Venn calculation 232 tool from Ghent University (http://bioinformatics.psb.ugent.be/webtools/Venn/) using the default 233

settings. 234
For hierarchical clustering we loaded the 5% FDR filtered NSAF multiconsensus files into Perseus 235 and filtered out proteins that did not have at least 75% valid values in at least one group and log2 236 transformed the data. We replaced invalid values with a constant value and z-score normalized the 237 resulting matrix by rows (proteins). Subsequently, we performed hierarchical clustering with the 238 following settings: Euclidean distance, preprocessing with k-means and average linkage. 239 For differential protein abundances we loaded the 5% FDR filtered NSAF multiconsensus files in 240 Perseus. We grouped samples by preservation method/ time point and filtered proteins for 75% valid 241 values in at least one group to only use consistently identified proteins. We replaced missing values by 242 a constant and performed a two-sided Welch's t-test using a permutation-based false discovery rate of 243 5% to account for multiple hypothesis testing. 244 13 245 For the differential protein abundance analysis of the 1000 most abundant proteins we loaded the 246 5% filtered NSAF multiconsensus, calculated the NSAF sum across all treatments and sorted 247 proteins from most to least abundant. We selected the 1000 most abundant proteins and re-248 normalized the abundance values based on the selected subset. We then loaded the resulting matrix 249 in Perseus, grouped samples by preservation method/ time point, log2 transformed the data and 250 replaced missing values by a constant. We used the resulting matrix as input data for Volcano plots 251 based on a t-test with a FDR of 0.05 and S0 of 0.1. 252 We calculated relative abundances for each species in the symbiosis using the method for assessing 253 the proteinaceous biomass described by Kleiner et al. [49], with the following modification. Instead 254 of using FidoCT for protein inference in Proteome Discoverer and filtering for proteins with at least 255 2 protein unique peptides, we used Sequest HT for protein inference and filtered for proteins with at 256 least 2 protein unique peptides. We visualized the results with the ggplot package in R [50,51]. 257 To assess biochemical properties of all identified proteins we obtained the number of amino acids 258 and predicted isoelectric points (pI) from Proteome Discoverer. We predicted transmembrane 259 helices (TMHs) with the TMHMM Server 2.0 [52]. Protein sequences of all identified proteins for 260 each study were used as input data. 261

274
We compared flash freezing and RNAlater™ preservation to determine if RNAlater™ is a suitable 275 method for preservation of field-collected samples targeted for metaproteomic analyses. We used 276 the marine gutless oligochaete O. algarvensis and its bacterial endosymbionts as our test system. To 277 simulate field conditions, we also conducted a time series to assess how metaproteomes were 278 affected by storage of samples in RNAlater™ for up to 4 weeks at room temperature. 279

Similar numbers of proteins identified for both preservation methods and all
280 RNAlater TM storage time points 281 We identified similar numbers of proteins for both flash frozen and RNAlater™ preserved samples. 282 On average, we identified 5,934 proteins in flash frozen samples and 5,780 proteins in RNAlater™ 283 preserved samples (Fig. 1A). The average number of identified proteins between flash frozen 284 samples and RNAlater™ preserved samples was not significantly different (student's t-test, p > 285 0.05). This suggests that neither of the tested methods outperforms the other in terms of total 286 number of identified proteins. 287

288
The number of identified proteins was stable across the four tested storage time points. On average, 289 we identified between 4,111 and 4,278 proteins per time point (Fig. 1B). None of the total protein 290 numbers were significantly different as compared to t0, the starting point of the RNAlater TM 291 incubation (student's t-test, p > 0.05). While the manufacturer recommends that samples should be 292 stored at 4℃ if storage exceeds 1 week our results suggest that proteins are well preserved for at 293 least 4 weeks at room temperature. formed groups based on method or time point (Fig. 2 A-D). For these analyses we only included 315 proteins that were consistently detected in at least one of the treatments/time points by filtering out 316 proteins that were not detected in at least 75% of samples for at least one condition. 317

318
Of 9,326 proteins identified in the preservation methods comparison (Additional file S3), 5,859 319 proteins remained after filtering and thus were considered to be consistently identified in at least one 320 of the treatments. Out of these 5,859 proteins almost all (5,797) were shared between flash frozen 321 samples and samples preserved in RNAlater™ (Fig. 2A). The hierarchical clustering of these samples 322 based on protein abundances revealed multiple shared nodes in the dendrogram between flash 323 frozen and RNAlater™ preserved samples (Fig. 2B). In case of a systematic bias introduced by the 324 preservation method we would expect separation of samples based on preservation method with no 325 shared nodes between preservation methods. These data suggest that the preservation methods did 326 not introduce a systematic bias. were shared across all four storage time points (Fig. 2 C). We identified 1 unique protein for samples 331 incubated for 1 day, whereas none of the other time points had unique proteins. Moreover, a few 332 proteins were shared between two or three out of the four different time points. The hierarchical 333 clustering of these samples revealed multiple shared nodes between samples of all time points (1 day, 334 1 week and 4 weeks) and samples of t0 (Fig. 2D). If a systematic bias had been introduced by long-335 term storage at room temperature we would expect separation of samples based on time point with 336 no shared nodes between test time points and t0. These data suggest that long-term storage at room 337 temperature did not introduce a systematic bias. 338 transformed. The hierarchical clustering was based on Euclidean distance. Z-score values were 347 calculated for each protein and thus positive Z-scores indicate a relative abundance higher than the 348 mean, while negative Z-scores indicate a relative abundance lower than the mean. 349 Only minor differences detected in relative abundances of individual proteins 350 across preservation methods or time points 351 We evaluated relative protein abundances across preservation methods and storage time points to 352 assess potential alterations in protein abundances introduced by method or time. For these analyses 353 we used the same dataset as above, including only proteins that were consistently detected in at least 354 one of the treatments/time points. We used a two-sided Welch test to identify significant 355 differences in protein abundances between methods and time points. We selected the 1000 most abundant proteins from each study and calculated the average relative 371 protein abundances for each method or time point to lower the influence of biological variability 372 between individual worms. We log2 transformed the data and visualized the differences between 373 with Volcano plots based on a t-test with a FDR of 0.05 and S0 of 0.1 (Fig. 3 A-E). The S0 374 parameter indicates the relative importance of t-test p-value meaning that even if a protein has a 375 significant p-value, if the fold change is below the S0 value it will not be considered statistically 376 significant. If proteins abundances significantly differed between treatments/time points they would 377 appear above the S0 line in the plot whereas if their abundances did not significantly differ their 378 values would be below the S0 line. We were unable to identify any significant differences in the 1000 379 most abundant proteins identified in the preservation method comparison (Fig 3A) or RNAlater ™ 380 time series (Fig. 3B-D). To further emphasize this finding we included an example study of 381 Escherichia coli which was grown in either oxic or anoxic conditions (Fig. 3E). As expected, there were 382 several differentially expressed proteins between the two growth conditions as indicated by data 383 points above the S0 line in Figure 3E. In summary, our analysis revealed no significant changes in 384 protein abundances between the tested preservation methods, as well as between t0 and the storage 385 represent S0 with a value of 0.1. Data points above the line represent proteins whose abundances 393 22 significantly differed between comparisons whereas data points below the line represent proteins 394 whose abundances did not significantly differ between comparisons. 395 Effects on microbial community structure 396 To investigate potential effects of preservation method or storage time on the representation of 397 specific taxa in the metaproteome, we compared the proteinaceous biomass of each community 398 member using a method adapted from Kleiner et al. [49]. This method enables calculations of 399 proteinaceous biomass contributions of species in microbial communities by using protein 400 abundances derived from metaproteomic analyses. 401 We found a small but significant difference in proteinaceous biomass of the host and Cand. T. 402 algarvensis in the preservation methods comparison (student's t-test, p-value < 0.05) (Fig. 4A, 403 Supplemental material Table S4 Table S3, Supplemental material Figure S1). In 411 this repeat experiment of the preservation method comparison we did not observe any significant 412 differences in proteinaceous biomass for the host or Cand. T. algarvensis or any other taxa (Fig.5 B , 413 Supplemental material Table S6-7, Figure S1). This suggests that there is either no effect or a small 414 23 inconsistently occurring effect on taxa representation introduced by flash freezing and RNAlater™ 415 preservation. 416 In the RNAlater™ time series, measured biomass abundances of species were relatively consistent 417 between individual worms (Fig. 4C, Supplemental material Table S8). The only exception was the 418 3-symbiont that was detected in only 3 individuals, which is in line with the fact that this symbiont 419 has been shown to be only present in a minority of individuals [8]. None of the symbiont or host 420 biomasses were significantly different when later time points were compared to t0 (student's t-test, respectively. Identified proteins were filtered for 5% FDR and at least 2 protein unique peptides 430 (PUP) prior to counting as described in [49]. Asterisks indicate significant differences in per-species 431 biomass (student's t-test, p-value < 0.05) (Supplemental material Table S4-9). 432

433
The two tested preservation methods rely on distinct preservation mechanisms, which holds the 434 potential for categorical loss or enrichment of proteins based on their biochemical properties. To 435 evaluate the potential introduction of method or storage time specific biases, we evaluated 436 biochemical properties including protein size, isoelectric point (pI) and number of transmembrane 437 helix domains (TMHs) across all samples. In contrast to the overlap analysis shown in Figure 2 A 438 and C, for this analysis all proteins identified within an FDR of 5% were considered. 439 We did not observe any significant differences in pI, protein size or number of predicted 440 transmembrane helices (TMHs) for preservation methods (Fig. 6A-C, student's t-test, p-value < 441 0.05) and storage time points (Fig. 6 D-F, student's t-test, p-value < 0.05). Counts within the 442 respective intervals were almost identical for all examined parameters. For example, the average pI 443 of proteins in flash frozen samples was 6.86 while it was 6.85 in RNAlater™ preserved samples and 444 the mean protein length was 296 amino acids for frozen samples and 297 amino acids for 445 RNAlater™ preserved samples. Overall, our analysis showed that we recovered proteins with almost 446 identical biochemical properties for both preservation methods and all storage time points. This 447 26 suggests that RNAlater™ robustly preserves proteins at room temperature for at least 4 weeks 448 without introducing biases based on biochemical properties. 449 RNAlater™ for metaproteomics of field-collected samples. Our main finding was that both 461 preservation methods performed equally well and that storage time in RNAlater™ for up to 4 weeks 462 did not impact the quality of the metaproteomes. However, there were other parameters which need 463 to be taken into account when considering actual field deployment, some of which might vary 464 depending on the sampling location and experimental design (Table 1). 465