Summary
The explosive growth of regulatory hypotheses from single-cell datasets demands accurate prioritization of hypotheses for in vivo validation. However, current computational methods emphasize overall accuracy in regulatory network reconstruction rather than prioritizing a limited set of causal transcription factors (TFs) that can be feasibly tested. We developed Haystack, a hybrid computational-biological algorithm that combines active learning and the concept of optimal transport theory to nominate and validate high-confidence causal hypotheses. Our novel approach efficiently identifies and prioritizes transient but causally-active TFs in cell lineages. We applied Haystack to single-cell observations, guiding efficient and cost-effective in vivo validations that reveal causal mechanisms of cell differentiation in Drosophila gut and blood lineages. Notably, all the TFs shortlisted for the final, imaging-based assays were validated as drivers of differentiation. Haystack’s hypothesis-prioritization approach will be crucial for validating concrete discoveries from the increasingly vast collection of low-confidence hypotheses from single-cell transcriptomics.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Additional detail about the method, along with new computational analyses. The text has been rewritten for clarity and some of the figures have also been edited.