RT Journal Article SR Electronic T1 The Effects of Nonlinear Signal on Expression-Based Prediction Performance JF bioRxiv FD Cold Spring Harbor Laboratory SP 2022.06.22.497194 DO 10.1101/2022.06.22.497194 A1 Heil, Benjamin J. A1 Crawford, Jake A1 Greene, Casey S. YR 2022 UL http://biorxiv.org/content/early/2022/07/06/2022.06.22.497194.abstract AB Those building predictive models from transcriptomic data are faced with two conflicting perspectives. The first, based on the inherent high dimensionality of biological systems, supposes that complex non-linear models such as neural networks will better match complex biological systems. The second, imagining that complex systems will still be well predicted by simple dividing lines prefers linear models that are easier to interpret. We compare multi-layer neural networks and logistic regression across multiple prediction tasks on GTEx and Recount3 datasets and find evidence in favor of both possibilities. We verified the presence of non-linear signal for transcriptomic prediction tasks by removing the predictive linear signal with Limma, and showed the removal ablated the performance of linear methods but not non-linear ones. However, we also found that the presence of non-linear signal was not necessarily sufficient for neural networks to outperform logistic regression. Our results demonstrate that while multi-layer neural networks may be useful for making predictions from gene expression data, including a linear baseline model is critical because while biological systems are highdimensional, effective dividing lines for predictive models may not be.Competing Interest StatementThe authors have declared no competing interest.