Abstract
Sample collection can significantly affect measurements of relative lipid concentrations in cell line panels, hiding intrinsic biological properties of interest between cell lines. Most quality control steps in lipidomic data analysis focus on controlling technical variation. Correcting for the total amount of biological material remains an additional challenge for cell line panels. Here, we investigated how we can normalize lipidomic data acquired from multiple cell lines to correct for differences in sample biomass.
We studied how commonly used data normalization and transformation steps during analysis influenced the resulting lipid data distributions. We compared normalization by biological properties such as cell count or total protein concentration, to statistical and data-based approaches, such as median, mean, or probabilistic quotient-based normalization and used intraclass correlation to estimate how similarity between replicates changed after normalization.
Normalizing lipidomic data by cell count improved similarity between replicates, but only for a study with cell lines with similar morphological phenotypes. For cell line panels with multiple morphologies collected over a longer time, neither cell count nor protein concentration was sufficient to increase the similarity of lipid abundances between replicates of the same cell line. Data-based normalizations increased these similarities, but also created artifacts in the data caused by a bias towards the large and variable lipid class of triglycerides. This artifact was reduced by normalizing for the abundance of only structural lipids. We conclude that there is a delicate balance between improving the similarity between replicates and avoiding artifacts in lipidomic data and emphasize the importance of an appropriate normalization strategy in studying biological phenomena using lipidomics.
Competing Interest Statement
The authors have declared no competing interest.