ABSTRACT
Delineating gene regulatory networks that orchestrate cell-type specification is an ongoing challenge for developmental biology studies. Single-cell analyses offer opportunities to address these challenges and accelerate discovery of rare cell lineage relationships and mechanisms underlying hierarchical lineage decisions. Here, we describe the molecular analysis of pancreatic endocrine cell differentiation using single-cell gene expression, chromatin accessibility assays coupled to genetic labeling and cell sorting. We uncover transcription factor networks that delineate β-, α- and δ-cell lineages. Through genomic footprint analysis we identify transcription factor-regulatory DNA interactions governing pancreatic cell development at unprecedented resolution. Our analysis suggests that the transcription factor Neurog3 may act as a pioneer transcription factor to specify the pancreatic endocrine lineage. These findings could improve protocols to generate replacement endocrine cells from renewable sources, like stem cells, for diabetes therapy.
INTRODUCTION
More than 400 million people are living with diabetes worldwide. Diabetes results from loss or dysfunction of hormone-producing endocrine islet cells in the pancreas, whose principal role is to regulate circulating glucose levels. Recent advances in tissue engineering to replace non- functioning endocrine cells have renewed interest in understanding the molecular mechanisms of pancreatic endocrine cell differentiation (Siehler et al., 2021).
A key event during endocrine pancreas development is expression of the transcription factor Neurog3 in select pancreatic duct cells (Gradwohl et al., 2000). Neurog3 specifies endocrine progenitor cells, which differentiate into hormone producing cells that delaminate from the duct and aggregate to form pancreatic islets (reviewed in Arda et al., 2013; Bastidas-Ponce et al., 2017; Benitez et al., 2012). Several distinct endocrine cell types aggregate within pancreatic islets, including insulinpos β-cells, glucagonpos α-cells, somatostatinpos δ-cells, ghrelinpos ε-cells, and PPYpos γ-cells. Mice lacking pancreatic Neurog3 fail to develop endocrine islet cells (Gradwohl et al., 2000; Gu et al., 2002; Schwitzgebel et al., 2000; Smith et al., 2004). In one model based on lineage tracing (Kopinke et al., 2011; Solar et al., 2009), Neurog3pos cells originate from a “bi-potent progenitor” with potential to generate either ducts or islets (reviewed in Bankaitis et al., 2015).
Emerging single-cell technologies are revolutionizing developmental biology by enabling quantitative molecular analysis of transient, rare cell types in developing organs, especially lineage progenitor cells. Recently, several groups used single-cell RNA sequencing (scRNA- Seq) to catalog dynamic transcriptome changes during mouse pancreas development and endocrine cell differentiation (Bastidas-Ponce et al., 2019; Byrnes et al., 2018; Krentz et al., 2018; Qiu et al., 2017a; Sharon et al., 2019; Yu et al., 2019). Some studies suggested that endocrine progenitor subtypes exist or are biased towards specific hormone lineages (Liu et al., 2019; Scavuzzo et al., 2018; Yu et al., 2019). While these reports contributed substantially to our understanding of endocrine pancreas development, no study has yet reported specification of the crucial islet δ-cell lineage (Arrojo e Drigo et al., 2019), or investigated chromatin conformation changes by overcoming cell labeling ambiguities related to Neurog3-GFP cells (Lee et al., 2002).
To address these unmet needs, we used an integrative approach that combined cell surface marker-based sorting, genetic labeling, chromatin analysis, and single-cell assays to elucidate molecular mechanisms underlying gene expression changes during endocrine pancreas differentiation. By establishing pseudotime trajectories for hormone lineages, including islet δ-cells, we identified unique combinations of transcription factors guiding differentiation of the β-, α-, and δ-lineages. Chromatin accessibility analysis using ATAC-seq unexpectedly revealed extensive similarities between duct cells and those that activate Neurog3. We discovered genomic regions that undergo substantial transformation during development and identified enriched motifs in open chromatin specific to differentiation stages. We also applied powerful genomic footprint analysis to identify transcription factor activity in open chromatin regions and found evidence of specific transcription factor footprints linked to their associated motifs. Our analysis suggests a revised model for endocrine pancreas development by providing evidence for direct development of this lineage from duct cells, and the absence of a bipotent progenitor.
Our results demonstrate the feasibility of using a combined scRNA-seq and ATAC-seq analysis to map gene regulatory networks that define pancreatic cell lineages. We anticipate our findings and those from similar work should foster efforts aiming to direct development of renewable cell sources, like stem cells, for tissue replacement and regeneration.
RESULTS
Single-cell transcriptomic analysis of endocrine pancreas development
To understand gene expression dynamics during pancreatic endocrine cell differentiation, we performed scRNA-Seq on cells isolated from mouse embryonic day 15.5 (E15.5) and E17.5 pancreas. We used the Neurog3-eGFP knock-in and Neurog3-Cre,Rosa-mTmG mice combined with cell surface markers to isolate specific populations from the embryonic pancreas (see Methods) (Lee et al., 2002; Muzumdar et al., 2007; Sugiyama et al., 2007). We followed the Smart-Seq2 protocol to sequence mRNAs from single-cells sorted into 96-well plates by fluorescence-activated cell sorting (FACS, Supplementary Figure 1A). Using this strategy, we collected and sequenced a total of 604 cells: 461 from E15.5 cells and 143 from E17.5 cells.
After initial read processing to count transcripts for each gene in each cell (Supplementary Table 1), we used Monocle2— a single-cell analysis tool, for downstream cell clustering and trajectory analysis (Supplementary Figure 2). Unsupervised clustering organized cells based on transcriptome similarity, revealing a recognizable sequence of pancreatic endocrine cell differentiation (Figure 1A, Supplementary Figure 1B). This developmental process included a progenitor cluster expressing high levels of Neurog3, a transitioning, early endocrine cell cluster, a definitive endocrine cluster marked by high levels of Chga expression, and a cluster of exocrine cells marked by Cpa1 expression (Figure 1B). We also found a small cluster of mesenchymal cells (14 cells, < 3% of total cells), which were excluded from further analysis.
To delineate gene expression programs involved in endocrine cell development, we aligned cells in a pseudotime trajectory based on quantitative gene expression profiles that change continuously in differentiating cells. This analysis placed all cells in a single trajectory that corroborated the known progression of duct cells into Neurog3pos progenitors, followed by hormone expressing endocrine cells (Figure 1C). We found more than 2,500 genes whose expression changed significantly along this pseudotime trajectory (q-value < 0.05). k-means analysis partitioned these differentially expressed genes into distinct gene clusters (Figure 1D, Supplementary Table 2). To better visualize the gene expression trends in each cluster, we used LOESS smoothing along pseudotime (Figure 1E; Methods). GO term analysis identified enriched biological process terms in these clusters relevant to pancreatic differentiation (FDR < 0.2, Figure 1E, Supplementary Table 3; (Arda et al., 2013; Bastidas-Ponce et al., 2017).
Cluster 1 included genes that are expressed at high levels at the start of the pseudotime trajectory, then decline significantly or are extinguished as cells differentiate into the endocrine lineages. These genes included known regulators of multipotent pancreatic progenitor or exocrine cells (Ptf1a, Hes1, Notch1, Rbpj), the cell cycle (Mki67, Ccna2, Cdk1), and factors involved in maintenance of chromosome organization or covalent chromatin modifications (Smc4, Ezh2 and Ctcf). Cluster 2 genes had a similar trend, although their expression remained detectable in endocrine cells. These include genes regulating RNA binding and splicing, translation initiation, and ribonucleoprotein complexes. Cluster 3 genes are mainly expressed in endocrine progenitor cells and trending similarly with Neurog3 expression, including Pax4, Tox3, and Cbfa2t3. Most Cluster 3 transcripts were only detectable transiently in progenitor cells, then extinguished in endocrine cells. Cluster 3 was associated with GO terms related to cell differentiation and endocrine pancreas development (Supplementary Table 3).
Clusters 4-6 contained genes whose expression increased following the Neurog3 induction. Cluster 4 genes included Chga, Pcsk2, Pax6, Iapp, Neurod1 and Isl1, and were turned on shortly after Neurog3 expression peaked, in early endocrine cells that still lack mRNAs encoding the principal islet hormones. Clusters 5 and 6 genes include the hormones, Ins1, Ins2, Ppy, Sst and Gcg, whose expression peak in endocrine cells. These clusters also included genes involved in vesicle mediated transport, ion transport, response to ER stress, regulation of insulin secretion, and exocytosis. Cluster 7 contains genes enriched with functions in the mitochondrial respiratory chain complex, proton transport, and ATP synthesis. Taken together, pancreatic endocrine cell specification involves highly dynamic gene regulatory programs, multiple groups of gene families with distinct functions.
Analysis of pancreatic endocrine progenitors
Prior studies reported the existence of distinct Neurog3pos endocrine progenitor subtypes (Liu et al., 2019; Scavuzzo et al., 2018; Yu et al., 2019). To investigate the heterogeneity in Neurog3pos progenitor cells, we focused on the cells expressing Neurog3 transcript in our dataset and visualized them using the t-SNE method. This analysis identified three clusters based on Neurog3 transcript abundance— designated as high, medium and low, though none of the clusters split into visually distinct groups on the t-SNE projection (Figure 2A). The Neurog3hi cells had the highest Neurog3 levels compared to other clusters (Figure 2B), likely the result of increased Neurog3 transcription that occurs during the secondary transition of endocrine differentiation (Schwitzgebel et al., 2000). Less than 10% of the Neurog3hi cells had detectable Chga expression (Figure 2C). In Neurog3med and Neurog3lo cells Neurog3 transcript levels decreased, while Chga levels increased (Figure 2C). Thus, the observed ‘transcriptional heterogeneity’ in Neurog3pos cells is a direct reflection of advancing development. Moreover, this data argues against a model where endocrine progenitor cells randomly develop from cells with heterogeneous Neurog3 levels. When we analyzed the expression of individual hormone genes, we found that the number of cells expressing Ins1, Ins2, Gcg, or Sst increased as cells transitioned from Neurog3hi to Neurog3lo progenitors, with Sst appearing only in the Neurog3lo cluster (Figure 2D). Additionally, we investigated the number of cells simultaneously expressing one, two, or three of these hormone genes and found that the number of cells co-expressing multiple hormone genes increased as Neurog3 expression decreases. For instance, none of the Neurog3hi cells were polyhormonal, whereas 18% of Neurog3lo cells expressed two, and 2% expressed all three hormone genes (Figure 2E).
To investigate whether there is transcriptional heterogeneity in Neurog3pos endocrine progenitors isolated from different developmental stages, we examined all Neurog3pos cells by incorporating the embryonic stage information onto the clusters (Figure 2F). We did not observe distinct clustering of E15.5 and E17.5 Neurog3pos endocrine progenitors; rather, the cells were arranged coincident with their developmental stage (Figure 2F). When temporally ordering Neurog3pos cells via pseudotime analysis, the continuous developmental progression was apparent in a single trajectory, without any branching (Figure 2G). Taken together, in our dataset we did not find evidence for lineage biases or subtypes in endocrine progenitors isolated from different embryonic time points. We found that nascent endocrine cells may transiently co-express mRNAs encoding multiple hormones in an intermediate ‘polyhormonal’ state preceding branch specification.
Single-cell trajectories defining endocrine cell type specification
While Neurog3 is necessary and sufficient to establish the pancreatic endocrine lineage, the mechanisms underlying subsequent endocrine lineage diversification are not well established. Other studies using single-cell approaches successfully delineated β- and α-cell branches of islet endocrine cell differentiation, but failed to identify a clear branch for δ-cell specification (Byrnes et al., 2018; Liu et al., 2019; Qiu et al., 2017a; Scavuzzo et al., 2018; Sharon et al., 2019; Yu et al., 2019). In our data, an unsupervised approach including all cells also did not yield to trajectories defining individual hormone lineages (Figure 1C). We reasoned that when all cells are included, the substantial change in gene expression programs at the onset of Neurog3 activation might hinder the discovery of less pronounced differences in the initial β-, α-, and δ-cell lineage decisions. To circumvent this issue, we focused analysis on cells after Neurog3 peak expression (Supplementary Figure 3) and performed semi-supervised clustering with marker gene information (Qiu et al., 2017b). Briefly, endocrine progenitors, β-, α-, and δ-cells were pre-assigned based on marker genes before attempting clustering. A prior study used a similar approach to resolve mixed hematopoietic lineages (Iterative Clustering and Guide-gene Selection, Olsson et al., 2016). We then performed iterative rounds of trajectory analysis, sequentially removing cells already assigned to an endocrine cell branch in each iteration, until all branches were identified (Figures 3A-B). This approach successfully partitioned β-, α- and δ-cells into nearly exclusive, specific branches (Figure 3C) suggesting that expert curation can overcome some limitations of trajectory analysis (also see Discussion).
TF networks regulating islet cell lineage gene expression
To reveal the gene expression changes underlying distinct trajectories of endocrine cell specification, we performed differential gene expression analysis between cells assigned to the β-, α- and δ-lineages. We defined the lineages as beginning from the duct cells and ending with hormone expressing endocrine cells (Figures 3A-B, Supplementary Table 4). We focused our analysis on transcription factors (TFs) due to their well-established role in determining cell fates. This analysis revealed 145 TFs whose expression changed significantly during endocrine cell differentiation (Supplementary Figure 4). We visualized how these TFs may be regulating distinct lineages by constructing a network based on TF expression patterns in each cell type (duct, β-, α- and δ-cells) or state (early progenitor, late progenitor; Supplementary Table 5, also see Methods for details). For instance, Hes1 was detected in duct cells, and thus was connected to the node representing the duct cell.
Topological examination of the TF expression-cell state interaction network revealed three network patterns. In Network Pattern 1, we found TFs highly specific to a single lineage. For example, 92% of cells in the β-cell lineage express Nkx6-1 and 71% of α-cells express Arx.
Nkx6-1 is thought to repress transcription of Arx, which specifies the α-cell lineage; conversely, Arx is postulated to repress transcription of Nkx6-1, which specifies the β-cell lineage (Schaffer et al., 2013). We found that Smarca1 is highly specific to the α-cell lineage, and this is consistent with recent reports of Smarca1 activation during α-cell development, prior to Gcg expression (Byrnes et al., 2018; Yu et al., 2019). Smarca1 is an ATP-dependent chromatin remodeler, which can be selectively recruited to cell type-specific enhancer elements (Vierbuchen et al., 2017). A second TF, Etv1 is a Neurog3 target, (Benitez et al., 2014) and in our data we find Etv1 is highly specific to the fetal α-cell lineage indicating this TF has a functional role in α-cell development. In our network, we confirmed that Hhex is specific to the δ- lineage (Zhang et al., 2014), and found additional factors. Zbtb20 has increased expression in δ- cells relative to β- and α-cells and to our knowledge, has not been reported before. Instead, Zbtb20 was recently identified as a TF upregulated in the α-cell lineage (Yu et al., 2019).
Because the δ- lineage was not defined in this report, it is possible that the uncategorized δ-cells aligned with the α-lineage instead. Other TFs that are highly specific to the δ-cell lineage but with no known functions include Zfhx2, Rere, and Cxxc4.
In Network Pattern 2, we found TFs that are expressed in multiple cell types or states. For instance, the high mobility group proteins Hmgb2, Hmgb3, and Tead2, a YAP signaling factor, are initially expressed in duct cells, and continue to be expressed in early Neurog3pos progenitors. We also found known TFs, including Isl1, Rfx6, Pax6, and Meis2 in the β-, α-, and δ-cell lineages. In line with a prior report, almost all endocrine cells in the β-, α-, and δ- lineages appear to pass through a Fevpos stage after Neurog3 expression (Byrnes et al., 2018). In this network, Fev is most specific to late progenitors. After islet cells transit through a Fevpos stage, Fev expression rapidly declines in the β-cell lineage but remains at detectable levels in α- and δ- cells (Supplementary Figure 4).
Network Pattern 3 includes TFs that follow an ON-OFF-ON pattern as cells differentiate from duct to progenitors to endocrine lineages. For example, Xbp1 is abundant in duct cells, but its levels decrease in early and late Neurog3pos progenitors, then increases in β-, α-, and δ-cells. In mice, loss of Xbp1 results in hyperglycemia (Lee et al., 2011), abnormal zymogen granules and aplasia of acinar cells (Hess et al., 2011). Xbp1 is an essential regulator of the unfolded protein response and endoplasmic reticulum (ER) stress (reviewed in Hetz, 2012). Similarly, Creb3 and Id2 follow the ON-OFF-ON pattern. These TFs were recently reported to be associated with ER and oxidative stress response programs in human islet β-cells (Xin et al., 2018).
Chromatin accessibility dynamics during islet endocrine cell differentiation
To investigate chromatin accessibility changes during endocrine cell differentiation, we performed ATAC-seq (Buenrostro et al., 2013) on purified populations of duct, endocrine progenitor, and endocrine cells isolated from E15.5 pancreas using the Neurog3-eGFP knock-in mice (Lee et al., 2002) (Figure 4A, Supplementary Table 6). In these mice, the coding region of Neurog3 is replaced by an eGFP cassette, thereby regulating eGFP production from the endogenous Neurog3 cis-regulatory element, including the promoter. As reported previously, heterozygous Neurog3eGFP/+ animals form a complete endocrine pancreas with no discernable phenotypes (Lee et al., 2002). However, in homozygous Neurog3eGFP/eGFP animals, eGFPpos cells lack Neurog3 and fail to differentiate further into the endocrine lineage.
To achieve requisite specificity needed for experiments involving purification of Neurog3- expressing cells, we managed two concerns not addressed in prior studies (Scavuzzo et al., 2018; Xu et al., 2014). First, since Neurog3 protein stability is transient and short-lived compared to eGFP (White et al., 2008), we needed methods to discriminate between eGFPpos Neurog3pos progenitors and eGFPpos Neurog3neg endocrine cells that have ceased to express Neurog3. We achieved this using modified cell sorting strategies (Sugiyama et al., 2007; see Methods). Second, to address possible concerns about Neurog3 gene dosage effects on endocrine cell differentiation, we used mice that are wild-type (“Tg(eGFP); Neurog3”, Gu et al., 2004), heterozygous, or homozygous null for Neurog3 (Figure 4B). This enabled direct comparison of chromatin states in endocrine progenitor cells with varying Neurog3 gene dosage. Specifically, we analyzed four distinct cell populations in different genetic backgrounds: (1) Neurog3pos hormoneneg cells (Neurog3); (2) eGFPpos Neurog3-null cells (Neurog3 null); (3) hormonepos islet cells (endocrine); and (4) duct cells (duct), (Figures 4A-B; Supplementary Table 6). In total we performed ATAC-seq on 15 primary pancreatic cell samples.
After aligning sequencing reads, we visually inspected loci near genes essential for pancreas development like Ptf1a, Neurod1 and Ins1 (Figure 4C). ATAC-seq revealed substantial reorganization of chromatin accessibility in regions near these and other genes (see below) during differentiation from duct cells to Neurog3pos endocrine progenitor cells, and endocrine cells. For instance, open chromatin “control regions” in the Ptf1a locus were detected in wild-type duct cells and Neurog3-null cells; the accessibility of this chromatin was then eliminated as duct cells transitioned into endocrine progenitors, a ‘closed’ state also maintained in endocrine cells (Masui et al., 2008). In Neurod1, an established Neurog3 target, promoter- proximal chromatin was closed in duct cells but became accessible in Neurog3pos endocrine progenitors. In the Ins1 locus, chromatin in control regions remained closed until cells committed to the endocrine lineage. Thus, cell purification combined with ATAC-seq generated chromatin maps that corresponded to distinct differentiation stages.
To investigate the similarity in chromatin states between ATAC-seq samples, we calculated pairwise Pearson correlation coefficients and organized samples by clustering (Figure 4D). This analysis revealed three groups that corresponded to duct cells, Neurog3pos progenitors and endocrine cells. Chromatin profiles of cells isolated either from wild-type or heterozygous Neurog3 mice were similar. Unexpectedly, Neurog3-null cells clustered with wild-type duct cells (Figure 4D). If ductal epithelia harbored bipotent cells that could become either endocrine progenitors or duct cells, we expected to see a distinct clustering of Neurog3-null from duct cells. Thus, cells that activated Neurog3 transcription in the ductal epithelium, but could not differentiate into endocrine lineage have chromatin that is indistinguishable from duct cells. This suggests that chromatin ‘priming’ in duct cells prior to expression of Neurog3 is not required for endocrine differentiation. Furthermore, Neurog3 might be a pioneer transcription factor, whose functions include the capacity to initiate nucleosome displacement or conformational changes in inaccessible chromatin (Figure 4E) (Zaret and Mango, 2016).
Differentially accessible chromatin regions reveal cis-regulatory elements that mediate endocrine lineage specification
To identify differentially accessible chromatin regions in our sorted cell types, we analyzed the ATAC-seq signal at every peak across all samples using the DE-Seq algorithm (Anders and Huber, 2010). From a total of 116,942 ATAC-seq peaks, we found 10,687 that have significant accessibility changes between samples (FDR <0.001). k-means clustering of differentially open peaks revealed three main groups of genomic regions that represent the open chromatin profiles of distinct cell states (Figure 5A, Supplementary Table 7). In Group I we observed 2,754 accessible regions in duct cells (either wild-type or Neurog3-null) that switch to a closed state in Neurog3pos progenitors and remain closed in endocrine cells. Using the GREAT algorithm (McLean et al., 2010), we found that these regions were associated with genes that have established roles in exocrine pancreas cell development, gland development and cell proliferation like Fgfr, Smad, Ptf1a, Hes1, and Notch signaling (Figure 5B). Group II includes 6,312 and Group III includes 1,621 accessible regions (Figure 5A). Based on the ATAC-seq signal, we observed that these regions are closed in duct cells, open in Neurog3pos progenitors and remain in open state in endocrine cells. The regions in Group III have significantly stronger ATAC-Seq signal in endocrine cells compared to endocrine progenitors, suggesting that other regulatory factors independent of Neurog3 might be enhancing the accessibility in these regions once the cells begin producing hormones. GREAT analysis linked chromatin from Groups II and III to genes known to regulate endocrine pancreas differentiation, or cardinal features of islet function including peptide hormone processing, and regulation of calcium ion-dependent exocytosis (Figure 5B).
To discover TF motifs within these dynamic chromatin regions, we performed TF motif enrichment analysis using the HOMER algorithm (Heinz et al., 2010). Consistent with GREAT analysis, we found overrepresented motifs (Figure 5C) of exocrine lineage specific factors like Tead, Rbpj and Nr5a2 in accessible chromatin regions of duct cells in Group I. In contrast, our analysis of regions in Group II identified Neurog3, NeuroD, Rfx and Pax motifs— all known regulators of endocrine pancreas development. Likewise, the analysis of Group III regions yielded enriched TF motifs of lineage markers of β- and α-cells, including Mafb and Isl1. Thus, by combining cell sorting, mouse genetics, and ATAC-seq we identified developmentally resolved chromatin states, and found sequence motifs enriched for regulators of pancreas development, demonstrating the sensitivity and specificity of our approach.
Identifying TF occupancy in regulatory genomic regions during endocrine cell differentiation
Chromatin accessibility assays, like ATAC-seq and Dnase-Seq, enable identification of TF occupancy sites where DNA is protected from enzymatic cleavage or transposition due to TF binding, leaving a “TF footprint” (Buenrostro et al., 2013; Maurano et al., 2012). We envisioned that an integrative approach combining TF footprint and single-cell gene expression profiles could uncover TF activity during endocrine pancreas differentiation. We used the BaGFoot algorithm to identify changes in TF occupancy between two cell states using our ATAC-seq samples (Baek et al., 2017). BaGFoot calculates two parameters for each TF motif: (1) footprint depth (FPD), the relative protection of DNA at the TF motif site, and (2) flanking accessibility (FA), the quantification of accessible chromatin near the TF motif (Figure 6A). TF binding dynamics is expected to affect these two parameters genome-wide; thus, by comparing the FPD and FA between two samples, we can infer changes in TF activity. For instance, a motif with a deep FPD, and high FA would indicate strong protection at the motif site. These results are represented in “bagplots”, which are analogous to “box and whisker” plots (Figure 6B, also see Methods).
We calculated the FPD and FA values for more than 650 curated TF motifs using our ATAC- seq data. Pairwise comparison of footprint signatures in duct cells and Neurog3pos progenitors, or duct cells and endocrine cells revealed changes in TF activity. Consistent with the HOMER- based motif analysis, we found strong footprint signals for Gata and Onecut TFs, and Nuclear Receptors in duct cells. In endocrine cells, we detected footprints for homeobox TFs including Isl1, Hnf1a and Pou TFs (Figure 6 C-E, Supplementary Table 8). Comparison of Neurog3pos progenitors and endocrine cells revealed relatively modest TF activity changes (Figure 6D).
Similar to the findings above, the most significant changes in TF footprint activity occurs during the transition from ductal to endocrine progenitor state, supporting the view that activation of Neurog3 is the main driver of changes in chromatin accessibility and gene expression.
We also calculated the FA and FPD scores of the TF motifs we derived de novo from our ATAC-seq motif enrichment analysis (Figure 5C). These motifs displayed increased FA or FPD in the appropriate cell type (indicated in bold, Figures 6C-E and Supplementary Table 8), independently validating the TF occupancy at these sequences.
While footprint depth and flanking accessibility are often correlated, some TFs only exhibited increased flanking accessibility without a detectable footprint, likely due to distinct DNA binding kinetics— for instance, those TFs with high OFF rates (Baek et al., 2017; Corces et al., 2018).
TFs matching this profile were basic helix-loop-helix (bHLH) factors including Neurog3, Neurod1 and Ascl2 in endocrine progenitors. In addition, some motifs were found in the second quadrant, displaying deeper FPD, but decreased FA in endocrine or Neurog3pos progenitor samples compared to duct cells. This profile is consistent with repressor TFs, whose DNA binding activity leads to decreased accessibility surrounding the motif. We found that Tead factors and ETS family TFs, including Etv6, Elf2/4, Erf were included in this group (Figure 6C, 6E).
Paralogous TFs often bind similar DNA motifs, resulting in nearly identical footprint scores. For instance, Neurog3 motif could also be recognized by Neurog1 or Neurog2 (Figure 6C, E). Thus, footprint analysis alone cannot determine which TF family member might be occupying the regulatory sequences in a particular cell type. Integrating BaGFoot results with single-cell expression data overcomes this limitation. We found more than 50 TFs whose expression correlates with a matching footprint (Figure 6F, Supplementary Figure 5, Supplementary Table 9). Among the TFs whose expression was detected in at least 25% of the cells within each group (Figure 6F), we confirmed the activity of known regulators, for instance Nr5a2 and Gata4 in duct cells (Hale et al., 2014; Xuan et al., 2012). In addition, we found footprints of several relatively less-studied Nuclear Receptor TFs (Nr2f6, Nr3c1) and we identified a CTF/NFI factor, Nfix that has increased activity in Neurog3pos progenitor cells (Figure 6D, Supplementary Figure 5). Taken together, footprint and expression analysis predicted dozens of regulators whose roles have not been previously explored in endocrine cell development, and provided quantitative evidence of selective TF occupancy in different pancreatic cell types.
DISCUSSION
Here, we established an integrative approach combining cell purification, genetic labeling, single-cell transcriptomics, chromatin accessibility assessment and TF footprint analysis to elucidate molecular mechanisms underlying pancreatic endocrine cell specification. We show that endocrine cell development is a dynamic process involving a network of TFs whose expression is selectively tuned to define specific hormone lineages. We were able to delineate gene expression changes leading to δ-cell specification, and nominate unrecognized factors that could regulate δ-cell function. We demonstrate that in developing pancreatic epithelial cells, chromatin undergoes substantial reorganization upon Neurog3 induction. In remodeled genomic regions during development, we identified enriched TF motifs and footprints that correspond to TF activity in specific cell types.
A few prior studies (Scavuzzo et al., 2018; Yu et al., 2019) postulated that the Neurog3pos progenitors exhibit heterogeneity and temporal lineage biases. In our study, using the same mouse models and embryonic stages, we did not find evidence for such bias even though our gene expression results aligned well with differential gene expression reported by Scavuzzo and colleagues. Thus, differences in our findings may reflect interpretation of alternative analytical approaches, rather than primary data. Similar points about challenges in single-cell analysis and biological interpretation were discussed in recent reviews (Kiselev et al., 2019; Tritschler et al., 2019).
Using an iterative, semi-supervised clustering approach, we successfully identified branching points that specify three hormone lineages, including β-, α-, and δ-cell lineages. In our dataset we found only 13 PP cells, which did not provide sufficient statistical power to permit a PP-branch identification. Due to the known regulatory role of TFs, we focused on differentially expressed TFs between these lineages. We identified known, as well as previously under- studied pancreatic TFs that may have roles in islet endocrine cell specification. Based on TF expression in specific developmental timelines, we generated a network and observed that lineage specification is governed by a network of TFs with dynamic, overlapping expression profiles. For instance, while Neurog3 is necessary for the endocrine lineage, it needs to be turned off to permit further differentiation of endocrine cell lineages. We speculate that this may explain the low efficiency observed in direct reprogramming approaches when a handful of lineage-specific TFs are constitutively overexpressed to force non-islet cells toward a β-cell fate (Hickey et al., 2013; Li et al., 2014). Our focused analysis on Neurog3pos cells revealed that the pan-endocrine state precedes specific endocrine lineages, and the early endocrine cells are polyhormonal as defined by their transcriptome. This may explain why the interconversion of hormone cell types does not require Neurog3 (Chakravarthy et al., 2017; Furuyama et al., 2019). These results are also reminiscent of reports of polyhormonal cells generated during the in vitro differentiation experiments using human embryonic stem cells or adult tissues with endoderm origin (Galivo et al., 2017; Krentz et al., 2018; Lee et al., 2013; Petersen et al., 2017; Veres et al., 2019).
Chromatin accessibility is thought to be a better predictor of cell identity than transcriptome analysis, with changes in chromatin states often preceding changes in gene expression (Corces et al., 2016). By taking advantage of established cell markers and genetic models, we were able to dissect the chromatin accessibility changes during endocrine cell differentiation at unprecedented resolution. The unexpected similarity between duct cells and those that activate Neurog3 forces a re-evalution of extant endocrine cell development models. For example, our findings provide evidence that pancreatic ‘trunk cells’ previously postulated to be oligopotent progenitors are simply duct cells that default to the ductal lineage in the absence of Neurog3 (Figure 4E). Comparison of Neurog3pos cells from heterozygous (Neurog3+/eGFP) and homozygous wild-type (Tg(eGFP);Neurog3+/+) mice showed that a single, wild-type Neurog3 allele is sufficient to drive global chromatin reconfiguration in the pancreatic endocrine lineage. It is likely that in individual ductal epithelium cells, Neurog3 concentration needs to reach a critical threshold to achieve pioneering activity and to compete with histone proteins for DNA binding (Bankaitis et al., 2015; Klemm et al., 2019).
Using a TF footprint algorithm, we provide quantitative, cell type-specific TF occupancy profiles at nucleotide resolution in pancreatic duct, endocrine progenitor and endocrine cell regulatory DNA. To our knowledge, this is the most comprehensive analysis of TF activity correlated with gene expression during pancreas development. TF-regulatory DNA interactions form the basis of gene regulatory networks, which are central to determining and maintaining cell type-specific transcription, cell fate and function. Further delineation of gene regulatory networks defining pancreatic cell lineages will be crucial for understanding pancreas disorders, and have the potential to improve gene therapy approaches using CRISPR-guided synthetic engineering to generate cells and tissues (Bevacqua et al., 2021). Expanding these strategies to human pancreas or in vitro differentiation efforts using emerging single-cell technologies that query chromatin and gene expression profiles (Ma et al., 2020) could offer new approaches to investigating the pathogenesis of type 1 and type 2 diabetes.
METHODS
Animal models
All animal experiments were conducted in accordance with Stanford University IACUC guidelines. Neurog3eGFP/+ knock-in reporter mice were a kind gift from Dr. Klaus Kaestner (University of Pennsylvania, USA) (Lee et al., 2002) and were maintained on a CD1 background. Neurog3-Cre mice were obtained from Guoqiang Gu (Vanderbilt University, USA) and maintained on a mixed background of C57BL/6 and CD1 (Gu et al., 2002). Rosa-mTmG (Muzumdar et al., 2007) mice were obtained from the Jackson Laboratories and maintained on a mixed background of C57BL/6 and CD1. Tg-eGFP; Neurog3+/+ transgenic mice were a kind gift from Drs. Guoqiang Gu and Douglas Melton (Gu et al., 2004). Timed matings were used to obtain mice at embryonic day (E) E15.5 and E17.5 for experiments; observation of a vaginal plug was considered E0.5 for embryonic staging purposes. Both male and female mice were used in all experiments.
Tissue processing and FACS
Pancreata were dissected from E15.5 and E17.5 embryos and checked for GFP using a fluorescence dissecting microscope. GFPpos pancreata were then digested with Tryp-LE express (ThermoFisher, 12605-010) for 5 minutes at 37℃, with regular pipet agitation to disrupt tissue. The digestion reaction was stopped by adding FACS buffer, which contains Ca2+ and Mg2+ free PBS supplemented with 2% Bovine serum albumin and 10 mM EGTA. The cell suspension was filtered to remove debris using a cell 70-micron cell strainer (BD Biosciences). Red blood cells were eliminated from dissociated cells using an RBC lysis buffer (BioLegend). Cells were then stained with Aqua live/dead viability dye (Thermo Fisher) to exclude dead cells during sorting. Cells were incubated with a blocking solution containing FACS buffer and goat IgG (Jackson Labs, 1:20 dilution) prior to staining with cell surface antibodies. After blocking, antibody staining was performed on ice for 30 minutes using the following antibodies: biotin mouse anti-CD133 (13A4, 1∶100; eBioscience), Streptavidin-APC (1∶200; eBioscience). We also used CD45-PE-Cy7 (eBioscience) to label and exclude leukocytes. We previously showed that CD133 labels Neurog3pos endocrine progenitors and duct cells (Sugiyama et al., 2007). By contrast hormonepos islet cells that no longer produce Neurog3 are CD133neg. After exclusion of CD45pos cells, the following gating strategies defined pancreas cell subpopulations: GFPposCD133neg cells were considered ‘endocrine’, GFPposCD133poscells were ‘Neurog3pos’ or ‘Neurog3 null’ if obtained from null animals, and GFPnegCD133pos cells were considered ‘duct’ (Benitez et al., 2014; Sugiyama et al., 2007). Representative gates are shown in Figure 4B. Note that the GFP intensity of Neurog3-null cells is reduced. In wild type cells, Neurog3 normally enhances its own expression through an auto-regulatory “positive feedback loop”. In null cells this mechanism is most likely absent (Ejarque et al., 2013; Lee et al., 2002; Wang et al., 2008).
Single-cell RNA Sequencing
Single-cell RNA-seq libraries were generated using the SMART-Seq2 method as described (Picelli et al., 2014). Dissociated cells were sorted directly into 96-well plates containing lysis buffer with ERCC RNA spike-in controls (ThermoFisher). The details about the sorted cell populations, genotypes, and associated plate codes are available in the GEO metadata file linked to this study. The lysis reaction was followed by reverse transcription with template-switch using an LNA-modified template switch oligos to generate cDNA. After pre-amplification, DNA was purified and analyzed on an automated Fragment Analyzer (Advanced Analytical). cDNA fragment profile corresponding to each single cell was individually inspected and only wells with successful amplification products (concentration higher than 0.06 ng/ul) and no detectable RNA degradation were selected for final library preparation. Tagmentation assays and barcoded sequencing libraries were prepared using Nextera XT kit (Illumina) according to the manufacturer’s instructions. Barcoded libraries were pooled and subjected to 75 bp paired-end sequencing on the Illumina NextSeq instrument.
RNA-Seq Read Alignment
Raw reads passing quality control using FastQC were aligned to a custom reference genome consisting of Fasta files for mm10, ERCC spike in controls, and three transgenes: eGFP, tdTomato, and Cre. STAR was used to create the custom genome and read alignment (Dobin et al., 2012). The resulting BAM/SAM files were used to create a ‘master counts table’ using HT- seq (Supplementary Table 1) (Anders et al., 2015). Cells had an average of 3,044 genes expressed per cell, ranging from 1,237 to 6,047 genes.
Unsupervised Single-cell Clustering and Trajectory Analyses
Clustering and trajectory analysis were performed using the single-cell analysis package Monocle 2 (v. 2.4.0) (Qiu et al., 2017b). A flowchart summarizing each analysis step is provided in Supplementary Figure 2. Before starting the analysis, the transgenes GFP, Cre and Td- tomato were removed from the master counts table (Supplementary Table 1). Unsupervised clustering aims to cluster the cells based on global gene expression profiles. First step is to choose which genes to use to cluster the cells. Based on the dispersion calculations, we set the mean_expression parameter to 1. Before performing dimension reduction, the data was examined using the plot_pc_variance_explained function, which plots the percentage of variance explained by each principal component on the normalized expression data. Based on the ‘elbow’ method, we determined that the first 5 dimensions showed the majority of data variability. Therefore, t-distributed stochastic neighbor embedding (t-SNE) dimension reduction was performed on the first 5 principal components. We set num_clusters to 7 to visualize cell clusters (Supplementary Figure 1). The identity of the cell clusters was revealed by mapping marker gene expression levels onto single-cells (Figure 1A-B). Clusters 4 and 6 were combined and labeled “Endocrine 1”. At this point, the 14 mesenchymal cells that formed Cluster 5, and genes that were expressed in less than 5 cells were filtered out. To establish pseudotime trajectories, Monocle’s differentialGeneTest function was used to find genes that vary among the clusters, specified as fullModelFormulaStr = “∼Cluster”. Top 100 genes with the lowest q-value were used to order cells, and a pseudotime trajectory was constructed using the DDRTree method. To identify gene expression changes between cells aligned along the established pseudotime trajectories, we used Monocle’s differentialGeneTest function by specifying fullModelFormulaStr = “∼Pseudotime”. We considered genes significant if the rounded q-value was less than or equal to 0.05. Gene ontology terms were found for each of the 7 clusters using DAVID v6.8 (Supplementary Table 3) (Huang et al., 2009).
Semi-supervised Single-cell Clustering and Trajectory Analyses
Semi-supervised clustering and trajectory analyses were performed to resolve individual endocrine lineage branching (Supplementary Figure 2). The process begins with defining marker genes that represent cell populations, then identifying the genes that co-vary with these markers, and finally ordering the cells based on these co-varying genes. Monocle provides the CellTypeHierarchy function for semi-supervised clustering analysis. Since our goal was to resolve the β-, α- and δ-cell branches, we picked marker genes as Neurog3 for endocrine progenitors, Ins1 and Ins2 for β-cells, Gcg for α-cells, and Sst for δ-cells. We set the expression threshold in each cell for these markers to 100 or more reads. Accordingly, cells that express more than one marker gene are labeled “ambiguous” and cells that do not fit into any marker gene category are labeled as “unknown”. The gene list was further filtered to remove genes if detected in less than 5 cells. Top 100 genes that co-varied with the marker genes (400 genes in total) were considered for the clustering and trajectory analysis. Note that the semi-supervised analysis was limited to the 317 cells that were placed after the Neurog3 peak expression in the unsupervised trajectory, which corresponds to the pseudotime point 6.7. The first iteration separated β-cells in one branch and the majority of α- and δ-cells in a second branch. To split the α- and δ-branches, we again focused on cells of interest, and excluded the cells on the β- cell branch to create a new CellDataSet (cds) object in Monocle. In this new cds() object, cells were relabeled as α-, δ-, and Neurog3pos cells based on marker gene expression.
Trajectory analysis was performed as described earlier. The final iteration established trajectories with α- and δ-cells separated on own branches. Similar to unsupervised clustering, Monocle’s differentialGeneTest (by specifying fullModelFormulaStr = “∼Pseudotime”) function was used to identify genes whose expression changes significantly during each endocrine lineage specification. For differential gene expression analysis, cells with pseudotime point > 5.7 and ≤ 6.7 were also included (peak Neurog3 expression) to visualize the cell fate transitions beginning from the Neurog3pos progenitors. Hence, three differential gene tests were performed to determine transcriptome changes from Neurog3pos progenitor cells to each of the three endocrine lineages. Results from differential expression analyses were filtered to include genes with a q-value less than 0.1 and those in the top 50% of normalized base mean expression among cells within each branch. All differentially expressed genes lists were further narrowed to TFs for a total of 145 TFs. These TFs are visualized in a heatmap where all cells were aligned in pseudotime order (Supplementary Figure 4).
Analysis and classification of Neurog3pos progenitors
The master read counts table (Supplementary Table 1) was subset to select Neurog3pos cells. We defined Neurog3pos cells as any cell with at least 10 read counts for Neurog3, resulting in 214 cells. The semi-supervised clustering approach was used to label and cluster cells based on either Neurog3 or Chga expression (see the previous section). Top 100 genes that co-varied with the marker genes (200 genes in total) were considered for the clustering and trajectory analysis. t-SNE dimension reduction was performed on the first two principal components, and num_clusters was set to 3. Based on the Neurog3 levels, the clusters were named High, Medium, and Low. A trajectory was established by finding differentially expressed genes among the High, Medium, Low clusters, using Monocle’s differentialGeneTest function by specifying fullModelFormulaStr = “∼Cluster”. Top 100 genes with the lowest q-value were used to order cells, and a pseudotime trajectory was constructed using the DDRTree method. The trajectory was colored based on embryonic day (Figure 2F) or cluster (Figure 2H).
To count hormone expressing cells, we analyzed the read counts of Ins1, Ins2, Gcg and Sst in each Neurog3pos cell. Any detectable expression (i.e. size-factor normalized counts > 0) was counted. The cells were then categorized as expressing zero, one, two or three hormones (Ins1 and Ins2 reads were combined and presented as Ins).
Expression Specificity Scores, TF-Cell Type/State Network
We derived expression specificity scores for TFs that are differentially expressed during endocrine cell lineage specification. We have previously used this method to reveal cell type- specific gene expression in human pancreas cells (Arda et al., 2018). ESS was calculated as follows: A cell state is defined here as population of cells that are quantitatively distinct based on their transcriptome. Two cell states (early progenitor, late progenitor) and four cell types (duct, β-, α- and δ-cells) were used to determine the expression specificity score of each TF. The duct cells were categorized as cells with pseudotime values < 3 (53 cells) based on the unsupervised trajectory analysis. Early progenitor state has cells with pseudotime values between 3 and 6.7 (111 cells). Late progenitor state cells have a pseudotime value greater than 6.7 and include those that were not assigned to an endocrine lineage (121 cells). The hormone producing cells consist of those assigned to their respective endocrine cell branch (90 cells in the β-lineage, 76 cells in the α-lineage, and 30 cells in the δ-lineage). To obtain xi, we used the size-factor normalized single-cell RNA-Seq counts as gene expression values. Thus, a TF with an ESS of zero would indicate no expression in that cell type/state, and an ESS of 1 would indicate exclusive expression, i. e. the TF is only expressed in that cell state. We obtained the list of differentially expressed TFs by overlapping the gene the list with a curated TF list (Weirauch et al., 2014), yielding 145 TFs. The TF list was further narrowed to 87 by only including those that were detected in at least 50% of the cells in that cell type/state (Supplementary Table 5). The network was generated by Cytoscape (version: 3.8.2) (Shannon et al., 2003). The color and thickness of the network edges (connections) directly corresponds with the expression specificity score (ESS) of the TF in the interacting cell type/state.
ATAC-seq assays and data processing
Three mouse genotypes were used for ATAC-seq analysis, Tg-eGFP; Neurog3+/+, Neurog3eGFP/+, and Neurog3eGFP/eGFP. From these animals, different cell populations were isolated as described in the ‘Tissue processing and FACS’ section (also see Supplementary Table 6). ATAC-seq was performed following the protocol in Buenrostro et al., 2013. On average 10,000 sorted cells were used for each ATAC-seq assay. Sorted cells were pelleted at 300 g, washed once with PBS. Nuclei were isolated, followed by the transposition reaction.
Transposed DNA fragments were purified using the Qiagen MinElute kit and amplified 6-8 cycles using the Nextera (Illumina) PCR primers. Libraries were sequenced as 2x50 on HiSeq2000 platform. ATAC-seq data processing and genome alignment was performed with PEPATAC (version 0.8.2), a pipeline developed to analyze ATAC-seq samples (Smith et al., 2021). PEPATAC begins by trimming adapters using skewer (version 0.2.2) with the parameters “-f sanger -t 8 -m pe”. Trimmed fastq files were then mapped to the mm10 genome with bowtie2 (Langmead et al., 2009) and the parameter “--very-sensitive”. Lastly peaks were called using MACS2 (Feng et al., 2012) with “-q 0.01 --shift 0 –nomodel”. At the end of PEPATAC processing, 42-88 million reads aligned to the mouse genome and 15,377-55,676 peaks per sample were detected. These peak regions were then merged using BedTools (Quinlan and Hall, 2010) to generate a non-overlapping consensus peak list for downstream analysis. ATAC- seq fragments corresponding to the peaks were quantified by using the annotatePeaks.pl function in HOMER suite, a genome analysis tool (v.4.10) (Heinz et al., 2010). DE-Seq (Anders and Huber, 2010) was used to find regions with significantly different ATAC-seq counts by running a generalized linear model with the modelFormula set to “count∼condition” and “count∼1”. Accordingly, DE-Seq calculates P-values and FDR. Peaks passing the FDR threshold < 0.001 were considered ‘differentially open regions’ (DORs) between cell types (∼10,600 DORs). Pearson correlation coefficient method was used to determine the similarity between ATAC-seq samples based on DORs. The results were visualized using the R package ggcorrplot with hierarchical clustering. DORs and samples were clustered by Cluster 3.0 tool using the k-means method (de Hoon et al., 2004). ATAC-seq fragment counts were further normalized by log2 transformation after shifting values +1 for visualization in TreeView (Saldanha, 2004). To assign DORs to regulatory domains and putative target genes we used the GREAT algorithm (v3.0.0) (McLean et al., 2010) with default settings. GREAT also outputs enriched GO Terms associated with these regions. For the GO Term enrichment analysis, DORs were used as test regions against whole genome (mm10) as background.
TF motif enrichment analysis
HOMER’s findMotifsGenome.pl function with ‘size 500 -len 6,8’ options was used to find enriched TF motifs in each DOR group (Heinz et al., 2010). HOMER’s de novo motif discovery analysis outputs a position weight matrix (PWM) for each significant motif. These PWMs were queried in the CisBP database (Weirauch et al., 2014) to find transcription factors associated with the significant motifs.
BaGFoot analysis and integration of gene expression
BaGFoot footprint analysis was performed as described in (Baek et al., 2017). Narrow peaks were called for all ATAC-seq samples using MACS2 (Feng et al., 2012) and merged to generate a set of consensus peaks for BaGFoot. Peaks overlapping with black listed regions (downloaded from http://mitra.stanford.edu/kundaje/akundaje/release/blacklists/mm10-mouse/mm10.blacklist.bed.gz, Amemiya et al., 2019) were removed from the analysis. 662 mouse TF motifs were curated from TRANSFAC (Matys et al., 2006), JASPAR (Mathelier et al., 2016) and UniPROBE (Newburger and Bulyk, 2009). In addition, we included 19 de novo motifs derived from our ATAC-seq data by HOMER motif analysis. ATAC-seq sample replicates were grouped as follows: the duct dataset consisted of duct-het, duct-null, and Neurog3-null samples, the Neurog3 dataset consisted of Neurog3-het and Neurog3-Tg samples, and the endocrine dataset consisted of Endo-het and Endo-Tg samples. Each group were compared pairwise to detect TF footprint activity at motif locations. BaGFoot results are presented in “bag plots”, where each data point represents a TF motif. In a bag plot, the bag area contains 50% of the data (similar to the box in the box plot), the fence contains 97%-100% of the data points (similar to the whiskers in a box plot) (Rousseeuw et al., 1999). Any data point outside the fence is an outlier. Most TF motifs are not expected to be different between two conditions, and thus are localized around the origin. The significant motifs were statistically determined by Hotelling’s T- squared test and were labeled as outliers.
Based on the BaGFoot results, we compiled a list of outlier TFs (and their paralogs) to analyze their expression levels in the scRNA-Seq data. 481 cells were divided into duct, progenitor, and endocrine cell types to obtain average expression levels for outlier TFs. Cells were assigned to one of these three cell types based on their placement from the pseudotime trajectory analyses. Endocrine cells are a combination of cells aligned on the β-, α-, and δ- branch (Supplementary Table 9). The TFs whose expression was detected in at least 25 cells within each cell group were listed in Figure 6F. Those detected in fewer than 25% of the cells were shown in Supplementary Figure 5.
Author contributions
H.E.A. and S.K.K. conceived and coordinated the study. C.M.B., P.T.P., L.L., K.T. bred the mice, performed embryonic pancreas dissection. H.E.A performed FACS, ATAC-seq, sc-RNA-Seq experiments. M.E., S.R.K, assisted with scRNA-Seq experiments and analysis. J.P.S. and N.C.S wrote the PEPATAC pipeline. S.B. wrote and performed the BagFoot analysis. E.D, H.E.A. analyzed the results, performed computational analysis. E.D., S.K.K. and H.E.A. interpreted the findings and wrote the manuscript with input from the other co-authors.
ACKNOWLEDGMENTS
We thank P. Batista, the members of the Arda and Kim laboratories for discussions and suggestions. This work was supported by the Intramural Research Program of the NIH, National Cancer Institute, Center for Cancer Research, USA, by funds from a JDRF Advanced Postdoctoral Fellowship (3-APF-2016-172-A-N) to H.E.A., and by the National Institute of General Medical Sciences (R35-GM128636) to N.C.S. Computational resources of the NIH HPC Biowulf cluster supported the analysis in this work (http://hpc.nih.gov).
REFERENCES
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵