Abstract
Understanding genetic control of cell diversification is essential for establishing mechanisms controlling biological complexity. This study demonstrates that the a priori deposition of H3K27me3 associated with gene repression across diverse cell states provides a genome-wide metric that enriches for genes governing fundamental mechanisms underlying biological complexity in differentiation, morphogenesis, and disease. We use this metric in combination with more than 1 million genome-wide data sets from diverse omics platforms to identify cell type specific regulatory mechanisms underlying diverse organ systems from species across the animal kingdom. From this analysis, we identify and genetically validate multiple novel genes controlling development in diverse chordates including humans and the tunicate, Ciona robusta. This study demonstrates that the conservation of epigenetic regulatory logic provides an effective strategy for utilizing large, diverse genome-wide data to establish quantitative basic principles of cell states to infer cell-type specific mechanisms that underpin the complexity of biological systems.
Introduction
Capturing the information basis of a cell through genome-wide sequencing is a powerful mechanism for understanding the complexities of development and disease. However, the information collated is often limited, reflecting only a snapshot of the steady state of the genome. Enhancing the strategies for predicting regulatory determinants of cell identity has proven to be essential for gleaning novel insights into developmental biology, disease mechanisms and cell reprogramming (Benayoun et al., 2014; Cahan et al., 2014; Rackham et al., 2016). Here, we demonstrate an approach to infer regulatory drivers of any cell state, without the requirement of external reference data or prior knowledge, by analyzing the landscape of diverse chromatin states for distinguishing features of cell specificity. We demonstrate that the a priori probability that the presence of a broad repressive H3K27me3 histone modification mark, which signifies the repressive tendency across a gene locus in diverse cell states, provides a quantifiable metric that strongly predicts regulatory genes governing mechanisms of cell differentiation and organ morphogenesis in health and disease. We show that the repressive tendency can be used to analyze individual transcriptomes of millions of heterogeneous cells simultaneously to infer the cell type-specific regulatory genes controlling somatic cell states across diverse species in the animal kingdom. With new capabilities in studying the genetic state of individual cells, these insights will potentially transform our capacity to understand the mechanistic basis of cellular heterogeneity in health and disease.
Results
Broad histone domains demarcate genes with distinct regulatory roles
We took the approach that the genome is equivalent to an information source that can exist in a continuum to derive a theoretically infinite number of specific cell states. To predict the regulatory determinants of one state, information about the genome from diverse cell states is required to infer how variations in genome activity deliver biological complexity. We focused on the breadth of histone modifications (HMs) which has been shown to be structurally and functionally linked to cell-specific genome architecture and gene regulation (Barski et al., 2007). We used NIH Epigenome Roadmap data (Kundaje et al., 2015), which contains ChIP-seq data for H3K4me3, H3K36me3, H3K27me3, H3K4me1, H3K27ac and H3K9me3 for 111 tissue or cell types (Table S1). To associate HM domains with proximal regulatory functions governing gene expression, we linked HM domains within 2.5 kb to known transcriptional start sites of RefSeq genes. For each of the six HMs, genes were annotated based on the broadest HM domain linked to the gene. For each HM, we found that the top 100 genes with the broadest domain were remarkably consistent between cell types (Figure 1A), however, broad domains of different HMs marked distinct sets of genes (Figure S1A). We further noted that genes marked with broad repressive HMs (i.e. H3K9me3 or H3K27me3) were more consistently shared between cell types than genes marked by other HMs (Figure 1A, inset) suggesting that broad repressive chromatin domains comprise a common strategy for epigenetic control of cell diversification.
We aimed to understand how the breadth of histone domains correlate with genes governing cell identity. To this end, we established a broadly applicable positive gene set for cell type-specific regulatory genes; this set is comprised of 634 variably expressed transcription factors (TFs) having a coefficient of variation greater than 1 (Table S2) and detected in 46 NIH Epigenome RNA-seq data sets (Perez-Lluch et al., 2015). We used Shannon entropy to quantify cell type-specificity (Schug et al., 2005) and demonstrate that variably expressed TFs are significantly more cell type-specific, compared to non-variably expressed TFs or protein coding genes (Supplementary Methods). Analysis of RNA-seq data sets from diverse cell and tissue types show that variably expressed TFs in each sample reflect appropriate tissue or cell type-specific regulatory functions (Figures 1B and 1C inset). Henceforth, variably expressed TFs provide a positive gene set where their enrichment is a performance metric for identifying cell type-specific regulatory genes.
We utilized variably expressed TFs to determine the relationship between cell type-specific regulatory genes and histone broad domains. To this end, all NIH Epigenome histone ChIP-seq data were ranked by domain breadth, comprising greater than thirteen million peaks, and analyzed using Fisher’s exact test to assess enrichment of variably expressed TFs. These data show that H3K27me3 uniquely and significantly enriches for variably expressed TFs within the top 5% of broad domains (Figures 1C, 1D and S1B). This demonstrates that quantification of H3K27me3 broad domains from diverse cell and tissue types provides a powerful metric to reproducibly enrich for cell type-specific regulatory genes governing the biological complexity of diverse cell states.
To illustrate the distinctive enrichment of H3K27me3 in regulatory genes as opposed to structural or housekeeping genes (Eisenberg and Levanon, 2013), we extracted expression and chromatin data from cardiomyocytes (Figures 1E and 1F). We show that the transcript abundance of cardiac regulatory genes (i.e. GATA4, GATA6, NKX2-5, TBX5 and TBX20) and structural sarcomere genes (i.e. MYH6, MYH7, MYL2, MYL3 and TNNI3) are all significantly elevated in cardiac cells compared to other cell types, but cannot be distinguished as regulatory or structural genes except by differential expression (Figure 1E). Furthermore, focusing on H3K27me3 of only the cardiomyocyte samples is uninformative in distinguishing structural from regulatory genes because these genes all lack repressive chromatin. In contrast, in all cell types except the heart, H3K27me3 domains broader than 30kb consistently identify cardiac regulatory genes from structural genes (Figures 1E and 1F). No other HM analyzed demarcates cell type-specific regulatory genes from structural genes in this manner (Figures 1F and S1C), establishing the rationale that the frequency of H3K27me3 across heterogeneous cell types provides a novel strategy to infer the likelihood of a gene having cell type-specific regulatory function.
Cell type-specific regulatory genes tend to be marked by broad H3K27me3 domains
We established a simple, quantitative logic that leverages the significance of broad H3K27me3 domains for distinguishing regulatory genes. Deposition of broad H3K27me3 domains allows for setting the default gene activity state to “off” such that cell type-specific activity occurs by rare and selective removal of H3K27me3 while all other loci remain functionally repressed (Boyer et al., 2006; Lee et al., 2006). Conversely, genes with housekeeping or non-regulatory roles rarely host broad H3K27me3 domains. We calculated for each gene in the genome across 111 NIH epigenome cell and tissue types (i) the sum of breadths of H3K27me3 domains in base-pairs and multiplied this by (ii) the proportion of cell types in which the gene’s H3K27me3 breadth is within the top 5% of broad domains (Figure 2A). This approach quantifies a single value for every gene that defines its association with broad H3K27me3 domains which we call its repressive tendency score (RTS) (Table S3). Using the NIH Epigenome Roadmap data, the RTS is calculated for 99.3% (or 26,833 genes) of all RefSeq genes. To demonstrate that our formulation is agnostic to the composition of cell types, we note that for all genes, the RTS is within one standard deviation of the mean of bootstrapping empirical distribution derived from 10,000 re-samplings of cell types. Furthermore, we note that the 111 cell types provided sufficient sample size to calculate a stable RTS (Figures S2A and S2B), with a majority of assigned H3K27me3 domains (over 85%) overlapping a single gene (Figures S2D, S2E, S3A and S3B). Importantly, the RTS only requires sufficient subsampling of H3K27me3 from any diverse collection of cell states to establish a stable metric.
Using RTS values above the inflection point (RTS > 0.03022) of the interpolated RTS curve, we identified a priority set of 1,359 genes that show a significant enrichment for genes underlying cellular diversification including organismal development, pattern specification and multicellular organismal processes (Figure 2B), and show they are cell type-specific (Figure 2C) and lowly expressed (Figure 2D). Among the 1,359 priority genes, we identified 318 TFs, including variably expressed TFs which had a significantly higher RTS overall (mean=0.083) compared to the background (mean=0.006, Figure 2E) in addition to 155 homeobox proteins, 291 non-coding RNAs genes (e.g. FENDRR and HOTAIR (Grote and Herrmann, 2013; Rinn et al., 2007)), and 260 genes involved in cell signaling. We also demonstrate that genes with a high RTS are enriched in key regulators of processes underlying gastrulation and organ morphogenesis, comprise members of many of the major signaling pathways, as well as genes implicated in pathologies including cardiovascular disease, diabetes, neurological disorders and cancer (Figure 2F and Table S4). Taken together, these data indicated that ranking based on a gene’s repressive tendency generates a simple and effective strategy to enrich for fundamental genetic determinants of biological complexity of cell states underlying health and disease.
Predicting cell type-specific regulatory genes based on H3K27me3
The transcriptome of a cell comprises a small fraction of the genome and represents the signature of structural, housekeeping and regulatory genes underlying a cell state. Identifying the regulatory genes controlling the identity, fate and function of a particular cell state is difficult to determine from thousands of expressed genes. To address this, we established a mechanism for integrating genome-wide RTS values with cell type-specific transcriptomic data. Since every gene is assigned a fixed RTS value that hierarchically orders the genome based on regulatory likelihood, we devised a computational approach to integrate the distinctive signature of any cell’s transcriptomic data with the RTS, a method we call TRIAGE (Transcriptional Regulatory Inference Analysis from Gene Expression). TRIAGE theoretically provides a means to identify cell type-specific regulatory genes for any cell type (Figure 3A). For any gene i the product between a gene’s expression (Yi) and repressive tendency (Ri) gives rise to its discordance score (Di) as defined by:
The discordance score reflects the juxtaposition of a gene’s association with being epigenetically repressed and the observed transcriptional abundance of that gene in the input data. Collectively, TRIAGE introduces a non-linear, gene-specific weight that prioritizes cell type-specific regulatory genes based on the input expression signature of any cellular state. Of importance, this strategy does not require reference to any external data set, uses no arbitrary statistical cutoffs, does not require additional cell type-specific epigenetic data, does not focus on a specific gene type such as TFs, nor does it utilize external databases or prior knowledge to derive its prediction.
To demonstrate TRIAGE, we identified known regulatory and structural genes from 5 tissue groups, analyzing H3K27me3 of cell-specific regulatory versus structural genes (Figures 3B). When applied to cell-specific transcriptional data, TRIAGE reduces the relative abundance of structural and housekeeping genes, while enriching for regulatory genes in a cell type-specific manner (Figure 3C). Taken to scale, TRIAGE transformation of all 46 Roadmap cell types results in enrichment of cell type-specific TFs among the top 1% in every cell type. Compared to the expression-based ranking, TRIAGE reduces the relative abundance of housekeeping genes (Figures 3D and S2C). Constructing a tanglegram based on the Pearson distances between Roadmap tissue types (Scornavacca et al., 2011), shows that relative to the total height of the dendrograms, TRIAGE increased the similarity between samples from the same tissue by ~29% when compared to distances calculated using absolute expression levels (Figure S4A).
Previous work by Benayoun et al. ranked genes based on broad H3K4me3 domains to enrich for cell type-specific regulatory genes (Benayoun et al., 2014). Using diverse cell and tissue types in which expression and H3K4me3 data are available, we demonstrate that TRIAGE outperforms original expression and H3K4me3 broad domains in both sensitivity and precision of identifying cell type-specific regulatory genes (Figures 3E, S4B and S4C).
Identifying cell type-specific regulatory genes from any chordate somatic cell type
Regulatory genes underlying cell identity during development are evolutionarily conserved. Using inter-species gene mapping, we tested whether TRIAGE could identify regulatory drivers of heart development across diverse chordate species including mammals (i.e. Homo sapiens, Mus musculus, and Sus scrofa), bird (Gallus gallus), fish (Danio rerio) and invertebrate tunicate (Ciona robusta) (Figure 3F). In contrast to expression alone, TRIAGE recovered cardiac regulatory genes with high efficiency across all species. More broadly, we used TRIAGE to enrich for relevant tissue morphogenesis biological processes from diverse cell types and species including arthropods (Figure 3G). While TRIAGE is currently devised using human epigenetic data, this suggests that TRIAGE can be used to identify regulatory genes from cell types that are conserved across the animal kingdom.
Dissecting the mechanistic basis of cell heterogeneity at single cell resolution
Recent developments in barcoding and multiplexing have enabled scalable analysis of thousands to millions of cells (Cao et al., 2019). Determining mechanistic information from diverse cell states captured using single-cell analytics remains a challenge. TRIAGE is scalable for studies of cell heterogeneity because it requires no external reference points and therefore provides a distinctive advantage for identifying regulatory control mechanisms one cell transcriptome at a time.
To illustrate this, we analyzed 43,168 cells captured across a 30 day time-course of in vitro cardiac-directed differentiation from human pluripotent stem cells (hPSCs) (Friedman et al., 2018). Analysis of day-30 cardiomyocytes using standard expression data show that high abundance genes are dominated by housekeeping and sarcomere genes, whereas TRIAGE efficiently identifies regulatory genes governing cardiomyocyte identity including NKX2-5, HAND1, GATA4, IRX4 within the top 10 most highly ranked genes (Figures 4A and 4B). Importantly, TRIAGE retains highly expressed cell-specific structural genes providing an integrated readout of genes involved in cell regulation and function (Figure 4C). We used TRIAGE to convert the genes-by-cells matrix comprising ten different subpopulations spanning developmental stages including gastrulation, progenitor and definitive cell types (Figure 4D). In contrast to expression data, which significantly enriches for structural and housekeeping genes, TRIAGE consistently identifies gene sets associated with development of every subpopulation through differentiation (Figures 4E and Figure S5). Lastly, standard -omics analysis pipelines implement differential expression (DE) followed by gene ontology, pathway or network analysis. We show that DE results in variable outcomes depending on the comparison and consistently under-performs against TRIAGE, which identifies population-specific regulatory genes across diverse cell states without any external reference comparisons (Figure 4F).
Predicting regulatory drivers of cell identity using any genome-wide analysis of gene expression
The simplicity of TRIAGE facilitates its use as a scalable application. Variably expressed TFs (Figure 1B)were used as a positive gene set to test enrichment of regulatory genes across diverse tissue types. For each tissue type we plotted the rank position of the peak significance (−log10p) value in a Fisher’s exact test. Using tabula muris data of nearly 100,000 cells from 20 different mouse tissues at single-cell resolution (Schaum et al., 2018), TRIAGE consistently enriches for cell type-specific regulatory genes compared to original expression with no difference between droplet and smartseq2 data sets (Figure 4G and Table S5). Using the mouse organogenesis cell atlas (MOCA), which is among one of the largest single cell data sets generated to date (Cao et al., 2019), we demonstrated that TRIAGE outperformed the expression value alone in prioritizing cell type-specific regulatory genes across more than 1.3 million mouse single-cell transcriptomes (Figure 4H). Lastly, we used benchmarking data for assessing clustering accuracy (Tian, 2018) to assess the performance of TRIAGE using three independent algorithms (i.e. CORE, sc3, and Seurat) and show no difference in accurately assigning cells to the reference (ARI > 0.98) using original expression or TRIAGE transformed expression (Figure 4I).
We hypothesized that TRIAGE could be used to study any genome-wide quantitative measurement of gene expression. To test this, TRIAGE was applied using diverse quantitative readouts of gene expression across hundreds of different cell types. TRIAGE vastly outperforms original abundance metrics when measuring chromatin methylation for H3K36me3, a surrogate of RNA polymerase II activity deposited across gene bodies (Barski et al., 2007) collected from the 111 Roadmap samples (Figure 4J). Similarly, cap analysis of gene expression (CAGE), which measures genome-wide 5’ transcription activity, showed significant enrichment of variably expressed TFs using TRIAGE from 329 selected FANTOM5 CAGE samples (Figures 4J and Table S1) (Forrest et al., 2014). Lastly, analysis of a draft map of the human proteome shows that TRIAGE enriches for regulatory drivers of 30 different tissue types from high resolution Fourier transform mass spectrometry data (Kim et al., 2014) (Figure 4J). Taken together, these data illustrate the power of utilizing TRIAGE to predict regulatory drivers of cell states using diverse genome-wide multi-omic endpoints.
Determining the regulatory control points of disease
Strategies for identifying genetic determinants of disease have the potential to guide strategies for predicting or altering the natural course of disease pathogenesis. We analyzed genetic data from melanoma and heart failure (HF) pathogenesis to determine the utility of TRIAGE in identifying regulatory determinants of disease.
Treatment for melanoma has improved with the advent of drugs targeting proliferative cells, but highly metastatic and drug resistance subpopulations remain problematic. To assess the potential for TRIAGE for informing disease mechanisms, we analyzed single cell RNA-seq data from 1,252 cells capturing a transition from proliferative to invasive melanoma (Tirosh et al., 2016). Among the top ranked genes, TRIAGE consistently outperforms expression in prioritizing genes with known involvement in melanoma proliferation and invasion (Figures 5A and Table S6). Using independently derived positive gene sets for proliferative versus invasive melanoma (Tirosh et al., 2016; Verfaillie et al., 2015), TRIAGE recovers with high sensitivity the genetic signatures of these two cancer states (Figure 5B). Gene set enrichment analysis using TRIAGE identified ETV5 and TFAP2A associated with proliferative melanoma versus TFAP2C and TBX3 as regulators of invasive melanoma (Figure 5C). TFAP2A and TBX3 have been implicated in proliferative and invasive melanoma respectively (Peres and Prince, 2013; Rambow et al., 2015), whereas ETV5 and TFAP2C were novel predicted regulators. To validate this, we used in vitro nutrient restriction of melanoma cells to trigger a transition into an invasive phenotype (Falletta et al., 2017; Ferguson et al., 2017). In contrast to expression dynamics of MITF, a master regulator of melanocytic differentiation, and TFAP2C is upregulated together with AXL, a receptor tyrosine kinase associated with therapeutic resistance and transition to invasive melanoma (Figures 5D and 5E). These data demonstrate the ability for TRIAGE to effectively identify genetic signatures of functionally distinct cancer cell states without external reference points.
We aimed to assess whether TRIAGE could identify transcriptional signatures of therapeutic interventions in heart failure (HF). Previous studies have shown that the epigenetic reader protein BRD4, a member of the BET (Bromodomain and Extra Terminal) family of acetyl-lysine reader proteins, functions as a critical chromatin co-activator during HF pathogenesis that can be pharmacologically targeted in vivo (Anand et al., 2013; Duan et al., 2017; Spiltoir et al., 2013) to prevent and treat HF by targeting gene programs linked to cardiac hypertrophy and fibrosis (Duan et al., 2017). We analyzed RNA-seq data from adult mouse hearts where pre-established HF (transverse aortic constriction, TAC) was treated with JQ1. TRIAGE prioritized TFs and regulatory genes with known roles in HF pathogenesis (Figure 5F), outperforming expression ranked genes based on stress-associated gene sets (Figure 5G). Importantly, comparison between Sham, TAC and TAC+JQ1 TRIAGE-based ranked genes highlighted a potent anti-fibrotic effect of JQ1 without the use of a canonical differential expression analysis (Figure 5G). Collectively, these data demonstrate the use of TRIAGE as a scalable strategy for studying the mechanistic basis of disease aetiology and therapy.
Identification of novel regulatory drivers of development
Lastly, we set out to demonstrate that TRIAGE can facilitate discovery of novel regulatory genes governing development in vitro and in vivo. Using data from single cell analysis of cardiac differentiation (Friedman et al., 2018) we analyzed sub-populations at day 2. TRIAGE identified known regulatory genes governing sub-population identity among the top 10 highly ranked genes (Figure 6A). Among the TRIAGE identified genes was SIX3, a member of the sine oculis homeobox transcription factor family (RTS=0.54) (Figures 6A and 6B). Importantly, all pairwise differential expression analyses failed to enrich for SIX3 (Figure S6A). Though the role of SIX3 in neuroectoderm specification has been studied extensively, little is known about its role in other germ layer derivatives (Carl et al., 2002; Lagutin et al., 2003; Steinmetz et al., 2010). Analysis of SIX3 in hPSC in vitro cardiac differentiation shows robust expression in day 2 definitive endoderm (DE) (28.7%) and mesoderm (37.5%) cell populations (Figure 6C) with enrichment of SIX3+ cells associated with definitive endoderm (Figures S6B and S6C). Using previously published laser microdissection approaches, we captured the spatiotemporal transcriptional data from germ layer cells of mid-gastrula stage (E7.0) embryos (Peng et al., 2016), with an expanded analysis to include pre- (E5.5-E.6.0), early- (E6.5) and late-gastrulation (E7.5) mouse embryos (Figure S6F). Spatio-temporal expression of SIX3 and other family members is observed in the epiblast and neuroectoderm, (Figures 6D and S6G) consistent with its known role in these lineages (Carl et al., 2002; Lagutin et al., 2003; Steinmetz et al., 2010), as well as early endoderm lineages (Figure 6D). Supporting this finding, SIX3 has been identified as a gene distinguishing definitive from visceral endoderm (Sherwood et al., 2007) but no functional studies have validated this finding.
We established CRISPRi loss-of-function hPSCs in which SIX3 transcription is blocked at its CAGE-defined transcription start site (TSS) in a dox-dependent manner (Figures 6E and 6F). Cells were differentiated using monolayer cardiac differentiation and analyzed at day 2 (Figure 6G). SIX3 loss-of-function depleted endoderm and mesendoderm genes (Figure 6H) consistent with FACs analysis showing depletion of CXCR4+/EPCAM+ endoderm cell (Figures 6I-K and S6D). In contrast, FACs analysis of alpha-actinin+ cardiomyocytes showed no difference between SIX3-knockdown cells compared to dox-treated controls indicating that loss of SIX3 does not impact mesodermal fates (Figures 6L-N and S6E). Taken together, these data demonstrate a novel role of SIX3 in endoderm differentiation.
We also used TRIAGE to identify novel developmental regulators in a distant chordate species, Ciona robusta. RNA-seq data comprising cell subpopulations captured across time-course of cardiac development were analyzed with TRIAGE using a customized gene mapping tool to link human to Ciona genes (Figure 6O) (Wang et al., 2019). The top ranked genes based on TRIAGE were analyzed (Figure 6P). RNF220 (RTS=0.30, Figure 6Q), an E3 ubiquitin ligase governing Wnt signaling pathway activity through β-catenin degradation (Ma et al., 2014; Tsoi et al., 2018), was identified as a novel regulatory gene not previously implicated in cardiopharyngeal development. Utilizing CRISPR control vs. RNF220-knockout, we demonstrate that Mesp lineage progenitors of control animals form the expected ring of pharyngeal muscle progenitors around the atrial siphon placode, whereas RNF220-knockout embryos showed significant morphogenetic defects. Collectively, these data illustrate that TRIAGE efficiently identifies novel functional regulatory determinants as a demonstration for discovering novel biology underlying mechanisms of development.
Discussion
Understanding the genetic determinants of cell diversity is essential for establishing mechanisms of development, disease etiology and organ regeneration, as well as synthetic control of cell states including cell reprogramming. Recent advances in deriving genome-wide data at single cell resolution (Cao et al., 2019; Schaum et al., 2018) as well as computational analysis and prediction algorithms (Benayoun et al., 2014; Cahan et al., 2014; Palpant et al., 2017; Rackham et al., 2016) have revolutionized our capacity to study complex biological systems. This study demonstrates the power of analyzing cell heterogeneity to understand genome regulation at scale and revealing a repressive tendency metric that provides a strong, quantitative prediction value for cell type-specific regulatory genes controlling cell diversification in development and disease. While sufficiently diverse data sets on epigenetic control of cell states are currently available only for human and mouse, we show that the evolutionary conservation of gene regulation enables this quantitative strategy to predict regulatory genes across diverse species in the animal kingdom. We hypothesize that this approach can be applied across H3K27me3 data from diverse cell and tissue types in species where gene expression is governed by the polycomb group complex. While not perfectly conserved through evolution, PRC2 and its regulation of histone methylation are known to govern genes in protists, animals, plants, as well as fungi. The conservation of this regulatory logic provides an effective strategy of utilizing large, diverse genome-wide data to establish quantitative basic principles of cell states to infer cell-type specific mechanisms that underpin the complexity of biological systems. We anticipate furthermore that this analytic approach can be applied to render customized inference predictions, based on chromatin transition, between diverse healthy and diseased tissues to reveal stress-sensitive loci and novel disease drivers. This conceptual and experimental framework can infer regulatory genes governing theoretically any cell state, and has broad utility for studies in genome regulation of cell identity in health and disease.
AUTHOR CONTRIBUTIONS
WJS: Developed the computational basis for the study, performed data analysis and wrote the manuscript.
ES: Assisted in experimental and computational design for the study, performed data analysis, carried out functional genetic studies in hPSCs and wrote the manuscript.
JX: Assisted with computational analysis and developed web interactive interface.
MA: Performed computational analysis on HF pathogenesis data
GA: Performed computational analysis on HF pathogenesis data
SS: Assisted the computational analysis on different single-cell data platforms.
BB: Performed computational analysis on melanoma studies.
YS: Performed computational analysis on Mouse Organogenesis Cell Atlas data.
BV: Performed functional analysis on ciona and validated the findings
GP: Assisted with spatiotemporal transcriptomic profiling of mouse gastrulation
NJ: Assisted with spatiotemporal transcriptomic profiling of mouse gastrulation
YW: Helped with computational analysis of epigenetic data
MP: Assisted with analysis and interpretation of melanoma data
AS: Carried out experiments involving melanoma analysis
YC: Carried out experiments involving melanoma analysis
PT: Supervised work on spatiotemporal transcriptomic profiling of mouse gastrulation
LC: Performed functional analysis on ciona and validated the findings
QN: Provided assistance to implement TRIAGE on single-cell data sets.
MB and NJP: Supervised the project, raised funding, and wrote the manuscript.
DECLARATION OF INTERESTS
The authors declare no competing interests.
ACKNOWLEDGEMENTS
E.S acknowledges funding by Children’s Hospital Foundation Queensland (Award Reference Number: 50268). B.V. acknowledges funding by American Heart Association grant #18PRE33990254. The Ciona work was supported by NIH/NHLBI award R01 HL108643 to L.C. M.A. was supported by the Swiss National Science Foundation (project P2LAP3_178056), P.P.L.T. is supported by the National Health and Medical Research Council of Australia (Grant 1110751). N.P is supported by the National Health and Medical Research Council of Australia (Grant APP1143163) and the Australian Research Council (Grant SR1101002).
Footnotes
Contact information: Nathan Palpant, Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia, T: 61 0439 241 069, E: n.palpant{at}uq.edu.au, Mikael Bodén, School of Chemistry and Molecular Biology, The University of Queensland, Brisbane, Australia, T: 61 07 336 51307, E: m.boden{at}uq.edu.au