PT - JOURNAL ARTICLE AU - Wiemann, Jasmina AU - Heck, Philipp R. TI - Quantifying the impact of sample, instrument, and data processing on biological signatures detected with Raman spectroscopy AID - 10.1101/2023.06.01.543279 DP - 2023 Jan 01 TA - bioRxiv PG - 2023.06.01.543279 4099 - http://biorxiv.org/content/early/2023/06/05/2023.06.01.543279.short 4100 - http://biorxiv.org/content/early/2023/06/05/2023.06.01.543279.full AB - Raman spectroscopy is a popular tool for characterizing complex biological materials and their geological remains1-10. Ordination methods, such as Principal Component Analysis (PCA), rely on spectral variance to create a compositional space1, the ChemoSpace, grouping samples based on spectroscopic manifestations that reflect different biological properties or geological processes1-7. PCA allows to reduce the dimensionality of complex spectroscopic data and facilitates the extraction of relevant informative features into data formats suitable for downstream statistical analyses, thus representing an essential first step in the development of diagnostic biosignatures. However, there is presently no systematic survey of the impact of sample, instrument, and spectral processing on the occupation of the ChemoSpace. Here the influence of sample count, signal-to-noise ratios, spectrometer decalibration, baseline subtraction routines, and spectral normalization on ChemoSpace grouping is investigated using synthetic spectra. Increase in sample size improves the dissociation of sample groups in the ChemoSpace, however, a stable pattern in occupation can be achieved with less than 10 samples per group. Systemic noise of different amplitude and frequency, features that can be introduced by instrument or sample11,12, are eliminated by PCA even when spectra of differing signal-to-noise ratios are compared. Routine offsets (± 1 cm−1) in spectrometer calibration contribute to less than 0.1% of the total spectral variance captured in the ChemoSpace, and do not obscure biological information. Standard adaptive baselining, together with normalization, increase spectral comparability and facilitate the extraction of informative features. The ChemoSpace approach to biosignatures represents a powerful tool for exploring, denoising, and integrating molecular biological information from modern and ancient organismal samples.Competing Interest StatementThe authors have declared no competing interest.