Abstract
In silico quantification of cell proportions from mixed-cell transcriptomics data (deconvolution) requires a reference expression matrix, called basis matrix. We hypothesized that matrices created using only healthy samples from a single microarray platform would introduce biological and technical biases in deconvolution. We show presence of such biases in two existing matrices, IRIS and LM22, irrespective of the deconvolution method used. Here, we present immunoStates, a basis matrix built using 6160 samples with different disease states across 42 microarray platforms. We found that immunoStates significantly reduced biological and technical biases. We further show that cellular proportion estimates using immunoStates are consistently more correlated with measured proportions than IRIS and LM22, across all methods. Importantly, we found that different methods have virtually no effect once the basis matrix is chosen. Our results demonstrate the need and importance of incorporating biological and technical heterogeneity in a basis matrix for achieving consistently high accuracy.