RT Journal Article SR Electronic T1 Predictive and robust gene selection for spatial transcriptomics JF bioRxiv FD Cold Spring Harbor Laboratory SP 2022.05.13.491738 DO 10.1101/2022.05.13.491738 A1 Ian Covert A1 Rohan Gala A1 Tim Wang A1 Karel Svoboda A1 Uygar Sümbül A1 Su-In Lee YR 2022 UL http://biorxiv.org/content/early/2022/12/26/2022.05.13.491738.abstract AB A prominent trend in single-cell transcriptomics is providing spatial context alongside a characterization of each cell’s molecular state. This typically requires targeting an a priori selection of genes, often covering less than 1% of the genome, and a key question is how to optimally determine the small gene panel. Reference data from these methods covering the whole genome is unavailable, and using single-cell RNA sequencing (scRNA-seq) datasets as a surrogate can result in suboptimal gene panels due to the fundamentally different data distributions across technologies. We address these challenges by introducing a flexible deep learning framework, PERSIST, to identify informative gene targets for spatial transcriptomics studies by leveraging existing scRNA-seq data. Using datasets spanning different brain regions, species, and scRNA-seq technologies, we show that PERSIST reliably identifies gene panels that provide more accurate prediction of the genome-wide expression profile, thereby capturing more information with fewer genes. Furthermore, PERSIST can be adapted to meet specific biological goals, such as classifying cell types or discerning neuronal electrical properties. Finally, via a simulation study based on a recent in situ hybridization-based dataset, we demonstrate that PERSIST’s binarization of gene expression levels enables models trained on scRNA-seq data to generalize with input data obtained using spatial transcriptomics, despite the complex domain shift between these technologies.Competing Interest StatementThe authors have declared no competing interest.