Uncovering Transcriptional Dark Matter via Gene Annotation Independent Single-Cell RNA Sequencing Analysis

Michael F.Z. Wang; Madhav Mantri; Shao-Pei Chou; Gaetano J. Scuderi; David McKellar; Jonathan T. Butcher; Charles G. Danko; Iwijn De Vlaminck

doi:10.1101/2020.07.31.229575

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) enables the study of cell biology with high resolution. scRNA-seq expression analyses rely on the availability of a high quality annotation of genes in the genome. Yet, as we show here with scRNA-seq experiments and analyses spanning human, mouse, chicken, mole rat, lemur and sea urchin, gene annotations often fail to cover the full transcriptome of every cell type at every stage of development, in particular for organisms that are not routinely studied. To overcome this hurdle, we created a scRNA-seq analysis routine that recovers biologically relevant information beyond the scope of the best available gene annotation. This is achieved by performing single-cell expression analysis on any region in the genome for which transcriptional products are detected. Our routine identifies transcriptionally active regions (TARs) using a hidden Markov model, generates a matrix of expression levels for all TARs across all cells in a dataset, performs single-cell TAR expression analysis to identify TARs that are biologically significant, and then annotates biologically significant TARs using gene homology analysis. This procedure leverages single-cell expression analyses as a filter to direct annotation efforts to biologically significant transcripts in complex tissues and thereby uncovers biology to which scRNA-seq would otherwise be in the dark.

INTRODUCTION

Single-cell RNA sequencing (scRNA-seq) is a powerful tool to study development, tissue, and disease biology^1–9. scRNA-seq analyses currently rely on the availability of high quality genome annotations to define cell features and to perform cell clustering, dimensionality reduction, differential gene expression, and other analyses^10–14. Identifying genes and correctly annotating their locations and coding regions within the genome is a difficult task that collaborative projects such as Ensembl, ENCODE, and GENCODE are striving to achieve. Gene annotations are generated using homology analyses, updated based on experimental evidence, and manually-curated for select organisms such as human, mouse, and zebrafish¹⁵. For less routinely studied organisms, fewer RNA-seq datasets are available and gene annotations are dependent on purely computational approaches such as phylogenetic sequence comparison with minimal verification‥ Consequently, as we show here with scRNA-seq experiments and analyses spanning six species (human, mouse, chicken, gray mouse lemur, naked mole rat, and sea urchin), significant number of genes and cell type specific information is missed in current scRNA-seq workflows.

To overcome this hurdle, we propose a procedure to perform scRNA-seq data analysis in the absence of a high-quality gene annotation (Fig. 1A). This procedure first identifies transcriptionally active regions (TARs) using a hidden Markov model (HMM)¹⁶. These HMM-derived TARs capture the transcriptional activity in the sample, limited only by the scRNA-seq measurement and quality of the genome assembly. Single-cell TAR expression analyses are then performed, using both annotated TARs (aTARs) and unannotated TARs (uTARs), revealing the degree of biologically relevant information that exists both inside and outside of known gene annotations. As expected, we find that the degree of relevant information in uTARs is strongly dependent on organism, tissue, cell type, and stage of development. uTARs that are differentially expressed across different cell clusters are retrieved and annotated using gene homology and gene prediction approaches. We employed this approach to uncover uTARs that distinguish canonical cell types during early stages of chicken embryonic heart development, and that define cell types in the lemur lungs, naked mole rat spleen, and sea urchin embryo.

Figure 1 Generating de novo features based on genome coverage.

A) Workflow to generate TARs and to identify biologically meaningful uTARs. B) Total genome assembly sequence length for human (hg38 and hg16), mouse (mm10), chicken (GRCg6a), mouse lemur (Mmur_3.0), naked mole rat (HetGla_1.0), and sea urchin (Spur_4.2). C) Total number of annotated transcripts in existing annotations normalized to the assembly sequence length for humans (hg38 GENCODE v30, hg16 RefSeq), mouse (GENCODE vM21), chicken (GRCg6a Ensembl v96), mouse lemur (Mmur_3.0 RefSeq), naked mole rat (HetGla_1.0 RefSeq), and sea urchin (Spur_4.2 RefSeq). D) Relative number of unique scRNA-seq reads outside of gene annotations contained in uTARs for each cell shown as violin plots (3849 cells in hg38 and hg19, 6113 in mouse, 14008 in chicken, 6321 in lemur, 2657 in naked mole rat, 2658 in sea urchin). Mean values (black dots) and 2 standard deviations above and below the mean (black bars) are shown. E) Relative number of unique scRNA-seq reads outside of gene annotations for different human genome assemblies and annotations at different times (3849 cells). F) Example of groHMM defined aTAR (red) and uTAR (maroon) features along hg16 chr22 with RefSeq hg16 gene annotations shown in blue. Sense strand coverage plotted in black while antisense strand coverage plotted in grey (log-e scale).

The approach described here can be applied in two ways. First, our procedure enables more comprehensive single-cell analysis when a high-quality gene annotation is not available. Second, our results point to an application for scRNA-seq assays to create, curate, and correct gene annotations, where single-cell expression analysis is used as a filter to direct annotation efforts to those features which are biologically significant.

RESULTS

Gene annotation independent scRNA-seq analysis

We developed a gene-annotation-independent scRNA-seq analysis and benchmarked it using publicly available and new datasets generated on the 10X Genomics platform (Methods). We explored different genome assemblies and annotations, including recent assemblies for human, mouse, chicken, gray mouse lemur, naked mole rat, and sea urchin, as well as prior annotations for human (Figs. 1B-D). The size of the genome assemblies ranged from ~1 billion base pairs to ~3 billion base pairs (Fig. 1B). As a proxy for the completeness of different gene annotations, we compared the number of annotated transcripts relative to the size of the genome assembly (Fig. 1C). We found that less routinely studied organisms such as the naked mole rat and sea urchin, as well as an older human genome annotation (RefSeq hg16 annotations for the hg16 reference assembly) have a lower number of annotated transcripts relative to the genome assembly size, as expected. We next quantified the number of sequences that align outside the gene annotation (assigned to uTARs, see below). For a recent human gene annotation (GENCODE v30 hg38), only 2% of reads aligned outside of the annotation. Considerably less effort has been devoted to the genome annotations of less routinely studied species, as is evident from the percentage of sequences that fall outside of the most current genome assemblies for gray mouse lemur (7.1%), nake mole rat (4.2%), and purple sea urchin (10.7%) (Fig. 1D). We examined older human genome assembly releases and found a decrease in the percentage of reads that mapped outside of annotations, as annotations improved from 2004 to 2019 (Fig. 1E).

We sought to perform scRNA-seq analysis based on all transcriptional products detected by the assay, including those that are not included in annotations. To identify TARs, we implemented a HMM after alignment of the RNA-seq data to the reference genome (Methods)¹⁶. Fig. 1F shows an example of TARs identified for human genome build 16 along a segment of chromosome 22 comprising IGLL5. Through comparison to the genome annotation, we labeled TARs as either annotated (aTARs) or unannotated (uTARs). Finally, we counted the number of transcripts for each TAR in each cell in a dataset, creating a digital expression matrix.

We used Seurat to analyze single-cell gene and TAR expression profiles^17,18. We performed dimensionality reduction and single-cell clustering analysis on both TAR and gene expression profiles to determine the extent of biologically significant information contained within uTARs. We analyzed the Tabula Muris⁷ 10X generated dataset for the mouse spleen and kidney, human PBMC data available through 10X Genomics¹⁹, a mouse lemur lung dataset, a publicly available naked mole rat spleen dataset²⁰, and a publicly available sea urchin dataset²¹. In addition, we performed single-cell RNA-seq profiling of embryonic chicken heart tissue at different stages of development (days 4, 7, 10, and 14 post fertilization). Following data pre-processing (Methods), we implemented dimensionality reduction (principal component analysis) and used UMAP^22,23 to visualize cells in 2D. We labeled cell groups based on existing metadata or expression of canonical marker genes (Methods). We found a significant number of mm10 assembly bases annotated by uTAR features (Extended Data Fig. 1A). Comparison of UMAP visualizations of single-cell gene and uTAR expression profiles revealed that significant cell type dependent structure is retained in the uTAR UMAPs (Fig. 2A, Extended Data Fig. 1B). We computed silhouette values as a measure of the consistency within clusters of the data and found close agreement between silhouette values calculated for aTAR and gene expression-based clustering, as expected, but also good agreement between uTAR and gene expression based clustering for several datasets (Fig. 2B). This observation suggests that scRNA-seq reads outside of gene annotations contain significant biological information that can accurately separate cell types. Dimensional reduction on human PBMC uTAR expression reveals significant structure for the older hg16 genome build, but not for the most recent hg38 build, as expected given that hg16 is older and has less comprehensive annotation (Extended Data Fig. 1B). Dimensional reduction on mouse spleen uTAR expression reveals structure in many cell types such as T and B cells while dimensional reduction on mouse kidney uTARs reveals less structure. For mouse lemur lung, naked mole rat spleen, and sea urchin embryo, we find that dimensional reduction on single-cell uTAR expression profiles reveals near identical structures to that of annotated gene expression profiles. We conclude that for species that are not routinely studied and for which high quality gene annotations are not available, extensive biologically significant information is missed by conventional scRNA-seq analysis. For chicken heart tissues, we find that cell type clustering is conserved more strongly in uTARs for tissues collected at earlier stages in development. Day 4 uTARs perform the best in terms of separating cell types relative to other days while day 14 uTARs perform the worst. This observation reveals a high prevalence of unannotated transcripts in early progenitor cell types while the transcriptional programs of mature cell types in day 14 are captured much better by the current best annotation. This suggests that the current chicken gene annotations do not define the complete transcriptional state of early embryonic tissue where many progenitor and transitional cell types are present. The extent of cell type specific information that is missing from standard genome annotations depends on the genome annotation, organism, tissue type, and stage of development.

Figure 2 Reads in uTARs can separate cell types in different organisms.

A) UMAP dimensional reduction on annotated gene expression features (top row) and uTARs (second row) for mouse spleen, mouse kidney, different time points in chicken embryonic heart development, mouse lemur lung tissue, and sea urchin embryonic tissue. Cells are colored in each column based on gene expression clustering except for the mouse atlas data where cells are colored by organ. Relative number of uTAR reads for each cell in every cluster also shown as violin plots (third row, colors correspond to UMAPs). 6113 cells in mouse spleen, 610 cells in mouse kidney, 4365 in chicken day 4, 2198 in chicken day 14, 6321 in mouse lemur lungs, 2657 in naked mole rat spleen, and 2658 in sea urchin embryo. B) Silhouette coefficient values based on 2D UMAP coordinates of gene expression (blue), aTARs (red), and uTARs (maroon) for 11 samples. UMAPs for samples labeled with (*) are shown in Extended Data Fig. 1B. Cell labels are defined by gene annotation clustering. C) Correlation between top 5 PC loadings and pseudo-bulk read coverage of uTARs across 11 samples. Horizontal line at uTAR PC loading = 0.5, vertical line at uTAR pseudo-bulk read coverage = 1e+4, r²= 4.0e-3. Quadrant numbers represent the number of uTARs in respective quadrant. D) Relative percentage of uTARs containing homology to any sequence (blue) and mRNA sequences (light blue) as a function of log-e fold change expression for each cell type in naked mole rat spleen data. BLAST sequence homology results relative to nucleotide collection database thresholds: mean uTAR peak query length = 686±731 bps, uTAR peak percent identity > 71%, evalue < 0.053, bitscore > 52.8.

We next evaluated the agreement between the most significant uTARs identified based on total expression level from pseudo-bulk RNA-seq analysis and uTARs identified based on principal component (PC) loading values which correlate with their ability to resolve cell types in single-cell analysis. There is little overlap between uTARs with expression level greater than 10,000 reads (across all cells, pseudo-bulk analysis, Methods) and uTARs with the highest loading values in the first 5 principal components calculated from scRNA-seq analysis (PCs, loading values greater than 0.5, 329 out of 3618 uTARs shared, Fig. 2C). In addition, we observed no correlation between the total expression level of uTARs (pseudo-bulk read coverage) and their loading values in the first 5 principal components (Fig. 2C, correlation r²=4.0e-3, p-value < 2.2e-16). This analysis indicates that uTARs with high total expression are not necessarily those that best define cell types, demonstrating the utility of scRNA-seq analysis as a selection filter before annotation and validation.

We next used the Wilcox Rank Sum test to identify uTARs differentially expressed in the naked mole rat spleen dataset (Methods). We identified the largest peak within each differentially expressed uTAR and identified homologous sequences to the peak sequence using BLAST. This analysis revealed that uTARs that are more differentially expressed in any cell type (i.e. higher average log-e fold change expression relative to all other cell types) are also more likely to have sequence homology to transcripts contained within the NCBI nucleotide collection database, many of which have high homology to mRNA transcripts (Fig. 2D). We proceed to name uTARs with high differential expression in our datasets based on the BLAST results with the highest e-values and bit scores.

uTARs uncover potentially novel transcripts

We proceeded to extract and annotate uTAR features that define cell types. Cell clusters were first labeled by cell type based on expression of canonical gene markers or existing metadata. uTARs that were expressed in a minimum number of cells in each cluster (25% for mouse and chicken day 4, 50% for naked mole rat, lemur, and sea urchin) and at least 0.25-fold difference compared to the rest of the cells in log-e scale (0.25-fold change for chicken day 4, 0.50-fold change for naked mole rat and mouse, 0.75-fold change for lemur, 1-fold change for sea urchin) were identified. These differentially expressed uTARs were labeled based on nucleotide sequence homology using BLAST^24,25 on coverage peaks within the uTAR (Methods).

We explored uTARs that were differentially expressed between immune cell types within the spleen from the Tabula Muris dataset (Fig. 3A). We uncovered a uTAR differentially expressed in natural killer cells containing reads with high sequence homology to GTH1 which plays a role in protecting cells from reactive oxygen species²⁶. We also found a uTAR containing homology to PRPF8, differentially expressed in macrophages, which is a pre-mRNA splicing factor that is essential for the catalytic step II in pre-mRNA splicing²⁷. In addition, we found uTARs containing homology to SNX29 and ATF7IP2, which were both upregulated in a small population of dendritic cells in the mouse spleen. SNX29 is broadly associated with microtubule motor activity and phosphatidylinositol binding and ATF7IP2is a transcription factor that couples other transcription factors to the transcription machinery. Both genes have predicted mouse orthologs but are unannotated in the GENCODE vM21 genome annotation.

Figure 3 Biologically relevant information is contained in uTAR features.

Differential uTAR feature analysis for mouse spleen data (A), chicken heart day 4 data (B), gray mouse lemur EPCAM+ lung data (C), naked mole rat spleen data (D), and sea urchin embryo data (E). Dot plot (left) of differentially expressed uTAR features that are labelled based on sequence homology and cell clusters are numbered along the x-axis. Dot size corresponds to the percentage of cells that express the uTAR feature while darker blue color corresponds to higher level of log-e normalized expression. UMAP (second left) colored and dimensionally reduced using gene expression features where cell clusters are labeled above the UMAP. Total coverage plot (top) of 5 uTARs along the length of the uTAR feature on the x-axis. The corresponding feature plot on UMAP projection is shown below the coverage plots where darker brown color correlate with higher log-e normalized expression in each cell.

We explored whether differentially expressed uTARs found in early stages of embryonic heart development were also differentially expressed in later stages. Silhouette coefficient analysis of the chicken dataset shows that uTARs better separate cell types in early development (Fig. 2A, B, Extended Data Fig. 1B). We identified 18 differentially expressed uTARs in the day 4 chicken heart with high differential expression and labeled them based on sequence homology (Fig. 3B dotplot). These include CLEC14A, a potential tumor endothelial marker that plays a role in angiogenesis²⁸, SOX5, a transcription factor that regulates embryonic development and cell fate differentiation^29,30, and RUNX1T1, a transcriptional co-repressor that suppresses oncogenesis^31,32. While SOX5 and RUNX1T1 are both annotated elsewhere in the chicken genome, we found uTARs with high sequence homology to these genes. The end of the SOX5 uTAR is ~120kb upstream of the SOX5 annotation, which is ~340kb in length, and the end of the RUNX1T1 uTAR is ~50kb upstream of the annotated counterpart, which is ~87kb in length. This suggests that the SOX5 and RUNX1T1 uTARs may be additional exons of the existing annotations. When we examined the same set of differentially expressed uTARs in day 14 chicken heart, we found that most were not differentially expressed (Extended Data Fig. 1C) suggesting a role for these differentially expressed uTARs in early development. Additionally, some differentially expressed uTARs in day 4 also had lower total read coverage in the day 14 dataset such as RUNX1T1 and CLEC14A (Extended Data Fig. 1C coverage tracks). Thus, uTAR analysis can identify potentially novel development stage specific transcripts.

We also explored differentially expressed uTARs in the gray mouse lemur lung, naked mole rat spleen, and sea urchin embryo datasets. Several differentially expressed uTARs were found in the mouse lemur EPCAM+ lung cells including H3F3C and BST2 (Fig. 3C). H3F3C is a novel unannotated gene that plays a role in maintaining nucleosome structure^33,34 while BST2 is associated with interferon gamma and other cytokine signaling pathways in the immune system^35,36. Differentially expressed naked mole rat uTARs were also identified within neutrophils, T cells, and macrophages (Fig. 3D). Homology analysis revealed these included DUSP1, NATD1, and TRG. The DUSP1 uTAR is a novel feature missing from the HetGla_1.0 with high total expression in neutrophils (5778 total unique scRNA-seq reads in 2657 cells). The protein encoded by DUSP1 is a phosphatase with dual specificity for tyrosine and threonine and is a potential target for cancer therapy in humans^37,38. We also found several uTARs that are highly expressed in the pigment cells of an embryonic sea urchin (Fig. 3E). They include several uncharacterized genes such as LOC115926763 and LOC115920005. Therefore, our uTAR analysis reveals several unannotated features in understudied organisms such as the naked mole rat, gray mouse lemur, and sea urchin.

Our approach conservatively labels TARs as uTARs when they are completely outside of gene annotations and label TARs as aTARs when they overlap with gene annotations but have opposite directionality. We expanded the definition of uTARs to include directionality and repeated the analysis of uTAR expression in developing chicken heart and observed similar behavior as described above (Extended Data Fig. 2). uTARs tended to cluster cell types better in early stages of embryonic development compared to later stages (Extended Data Fig. 2A, B). We identified 34 differentially expressed uTARs in the day 4 developing chicken heart where most lose their differential expression in later timepoints of development (Extended Data Fig. 2C). 11 of these uTARs overlap in position with an annotated gene but are transcribed from the antisense strand of a gene in the existing annotations. These features share a similar differential expression with the annotated counterpart in day 4 (Extended Data Fig. 2D) suggesting that the transcription of these antisense uTARs are correlated with the transcription of the corresponding sense gene features.

Spatial transcriptomics reveals spatial co-expression of uTARs and canonical gene markers

We used data generated on the 10X Genomics Visium spatial transcriptomics platform to visualize the spatial co-expression of differentially expressed uTARs with canonical gene markers. 10 μm thick coronal tissue slices of embryonic heart at days 4 and 14 post fertilization were used for spatial transcriptomic analysis. The resulting data comprised 747 and 1995 barcoded spots for day 4 (5 hearts) and day 14 (1 heart) respectively. We found that the expression of a SH3BGR- related uTAR co-localized with that of the canonical cardiomyocyte marker TNNT2 at both day 4 and day 14 (Fig. 4A left), in line with observations from scRNA-seq (Fig. 3B, Extended Data Fig. 1C). We also observed clear expression of RUNX1T1 uTAR in the day 4 tissue section but no expression in the day 14 section, in concordance with the scRNA-seq data that suggested stage-dependent expression of this uTAR (Fig. 4A right). Expression of the RUNX1T1 uTAR at day 4 colocalized with a subset of COL1A1+ spots which is a marker for epicardial cells. Interestingly, the annotated RUNX1T1 gene has almost no expression in the spatial data revealing a discrepancy between annotated gene expression and uTAR expression. We quantified the correlation between the spatial expression of 11 uTARs in the day 4 sample and several canonical gene markers such as HBZ for red blood cells, TNNT2 for myocytes, and COL1A1 for epicardial progenitors (Pearson correlation, log-e normalized spot expression). We then performed hierarchical clustering on correlation values (Fig. 4B). We found that spatial uTAR expression in several cell types colocalized with canonical markers for those cells. Immune and immature fibroblast cell types are two of the 3 least abundant cell types according to the scRNA-seq data. As a result, their spatial correlation was low due to zero expression in most spots in the spatial data. Altogether, spatial transcriptomic data agreed with our findings of cell type and stage specific expression of highly expressed uTARs.

Figure 4 Spatial transcriptomics to map uTAR expression in chicken embryonic hearts.

A) Spatial log-e normalized expression of canonical TNNT2 myocytes marker, SH3BGR uTAR, canonical COL1A1 epicardial cells marker, RUNX1T1 uTAR, and annotated RUNX1T1 gene for chicken embryonic heart at day 4 (5 hearts) and day 14 (1 heart) post fertilization. B) Dendrogram computed on Pearson correlation of log-e normalized spatial expression for canonical gene markers and uTARs (underlined) in a day 4 chicken heart tissue section.

DISCUSSION

Current scRNA-seq analysis relies on the availability of high-quality gene annotation to identify cell types, study dynamic transcriptional behavior over time, and predict gene interactions. We developed a method to identify unannotated TARs that distinguish cell types in single-cell transcriptomic data. This approach is most relevant for the analysis of scRNA-seq datasets from less routinely studied organisms, and datasets from developing tissues that often comprise, as we show here, transcriptional products that are not represented in gene annotations. We present proof-of principle applications of this procedure to study unannotated transcripts in embryonic development and to explore previously unknown genomic features in several species.

The strategy we propose broadly identifies transcriptionally active regions, without attempting to filter, curate, or characterize those regions, and then employs single-cell expression analyses to identify regions outside of the gene annotation that are relevant for the identification of cell states or types. Other transcriptionally active regions may encode transcripts that are broadly expressed in all cell types. This strategy allows focusing annotation, curation, and validation efforts to those regions that are biologically significant. Our procedure expands the scope of scRNA-seq to tissues of understudied species, cell types, and development stages, enhancing the range of new biology that can be uncovered using these powerful approaches. In addition, our work points to an application for scRNA-seq to refine existing gene annotations, or to create de novo gene annotations.

METHODS

Publicly available datasets

Fastq files for the human PBMC dataset were downloaded directly from the 10X Genomics library of single cell gene expression data. Tabula Muris alignment files (BAMs) for 10X Genomics droplet generated data were downloaded from the Gene Expression Omnibus (GSE109774) and fastq files were extracted using the 10X Genomics bamtofastq tool³⁹. Mouse uTARs were generated based on combining alignment files across all droplet generated data and scRNA-seq analysis for kidney (10X_P4_5) and spleen (10X_P4_6) samples are shown. Fastq files for the naked mole rat and sea urchin datasets were downloaded from GEO listed in their respective publications (GSM3885302 and GSE134350 respectively). Naked mole rat uTARs were generated based on combining alignment files for the spleen samples (SRR9291380, SRR9291381, SRR9291382, SRR9291383, SRR9291384, SRR9291385, SRR9291386, SRR9291387) and scRNA-seq analysis for sample SRR9291380 (nmr_1.1) is shown. Sea urchin embryo uTARs were generate based on combining alignment files generated from SRR9693264, SRR9693265, and SRR9693266 and scRNA-seq analysis for sample SRR9693264 (D1) is shown. Mouse lemur lung tissue uTARs were generated from the Tabula Microcebus consortium by combining MLCA_ANTOINE_LUNG_EPCAM_POS_S12, MLCA_ANTOINE_LUNG_CD31_POS_S11, MLCA_ANTOINE_LUNG_P3_S7 datasets and scRNA-seq analysis for MLCA_ANTOINE_LUNG_EPCAM_POS_S12 is shown.

Generation of chicken embryonic heart scRNA-seq data

Fertile white leghorn chicken eggs were incubated using an egg incubator that controls humidity and temperature until the embryonic day of interest. Chicken embryonic ventricular free walls were excised aseptically in ice cold Hank’s Balanced Salt Solution (HBSS) and then minced into one-millimeter tissue fragments. Six dozen day 4, Hamburger-Hamilton developmental stage 21-23 (HH21-23), whole ventricles, four dozen day 7 (HH30-31) left and right ventricles, three dozen day 10 (HH35-36) left and right ventricles, and one dozen day 14 (HH40) left and right ventricles were respectively pooled for seven total samples to be analyzed via single cell RNA sequencing. The day 4 and day 7 ventricular tissue fragments were digested under constant mild agitation at 37°C in 1.5mg/mL collagenase type II/dispase (Roche) for one cycle of 20 minutes and one cycle of 10 minutes, while the day 10 and day 14 ventricular tissue fragments were digested in 300U/mL collagenase type II (Worthington Biochemical Corporation) for four cycles of 10 minutes to dissociate the cells from the tissue. Cells were then passed through a 40μm filter and centrifuged into a pellet. For all samples, blood was removed by resuspension in an ACK lysis buffer followed by centrifugation and two washes in 1X phosphate buffered saline with 0.04% bovine serum albumin (1XPBS/BSA). For the day 14 samples, cells underwent fluorescence-activated cell sorting (FACs) to sort for live, non-apoptotic cells. Prior to FACs, the day 14 samples were first stained for 10 minutes in 4 μg/mL Calcein AM at 37°C in HBSS with 2% fetal bovine serum (HBSS/FBS), washed in HBSS/FBS, centrifuged, stained with 1μg/mL of 7AAD on ice in HBSS/FBS for at least 30 minutes, and then immediately cell sorted in HBSS/FBS. For all samples, single cell suspensions were then resuspended at 1 × 10⁶ cells/mL in 1XPBS/BSA prior to single cell RNA sequencing. The 10X Genomics gene expression Library v2 Kit was used to isolate day 10 and day 14 samples. The 10X Genomics gene expression Library v3 Kit was used to isolate 4DPF and 7DPF samples. Illumina NextSeq500 paired-end sequencing was used to sequence each sample. Chicken uTARs were generated by combining sequence alignment files for all samples. scRNA-seq analysis shown for the right ventricle of day 7, 10, and 14.

Sequence alignment and generation of expression matrices

Fastq sequence files were processed (adapter trimming, barcode tagging) using the Drop-seq⁴⁰ suite of computational tools and were aligned to respective genomes using STAR without gene annotation indexing. Gene expression matrices were generated based on gene annotations and TAR expression matrices were generated based on TAR annotations (described below) using Drop-seq tools.

Identifying TARs from alignment files using groHMM

We used groHMM¹⁶ to predict transcribed regions from aligned scRNA-seq data in a strand specific manner without annotations. Uniquely mapped reads from multiple scRNA-seq data were merged (i.e. without consideration for coverage of individual cells). We down sampled the combined alignment file of the Tabula Muris dataset and kept 5% of all uniquely aligned reads to account for the computational memory constraints of the groHMM tool. Then, we used groHMM to scan the mapped read counts along the genome with window size 50bp without overlap. The emission probabilities were modeled by a gamma distribution. The gamma distribution parameters and transition probabilities were learned using the Baum-Welch expectation maximization (EM) algorithm. We predicted the transcribed regions from both the sense and antisense strands. Predicted transcribed regions within 500bp were merged using bedtools⁴¹ merge (parameters: -s -d 500). The coverage of each region was calculated using bedtools coverage (parameters: -s -counts -split). Transcribed regions with at least n reads were kept and used as TAR features in the following analysis, where n is set as 1/10,000,000 of uniquely mapped reads in the combined alignment file derived from samtools⁴² view (parameters: -q 255).

TARs identified using the groHMM algorithm were labelled as annotated TAR (aTAR) or unannotated TAR (uTAR) features based on their overlap with existing gene annotations. The refFlat genome annotation format was used in all cases where the genomic start and end position of each transcript is recorded. HMM features that have any overlap with an existing gene annotation transcript based strictly on transcription start and stop sites were labelled as “aTAR”, without accounting for strandedness. HMM features that overlap with an existing gene annotation but were on opposing strands were also labeled “aTAR”. All other HMM features were labelled as “uTAR”.

Single-cell transcriptome analysis

We used the Seurat suite of scRNA-seq tools^17,18 to analyze gene expression and TAR expression data. Cells were filtered based on the number of gene expression features and TAR features were filtered based on being expressed in at least 2 cells. Cell filtering parameters were different for each dataset to optimize downstream analysis and to account for different protocols. Tabula Muris cells were filtered and labeled based on metadata information provided in the paper. Human PBMC cells were filtered based on having between 201-2599 gene features and less than 5% mitochondrial transcripts according to the hg38 assembly and GENCODEv30 annotations. Chicken cells were filtered based on having more than 200 gene features and less than 20% mitochondrial transcripts. Gray mouse lemur cells were filtered based on having between 201-2499 gene features. Naked mole rat cells were filtered based on having between 401-2499 gene features. Sea urchin cells were filtered based on having between 401-2499 gene features. Gene and TAR combined expression matrices were log-normalized and scaled before further processing. For the human PBMC dataset, 4000 variable features (including genes and TARs) were found based on mean-variance ratios. PCA was performed on variable gene features, aTARs, and uTARs in the human PBMC dataset. PCA was performed using all genes, aTARs, and uTARs in the other datasets. Cells were clustered using a KNN graph-based clustering approach with the first 10 PCs derived from gene expression reduction. UMAP projections were derived from the first 10 PCs using either gene expression, aTAR expression, or uTAR expression.

Identifying differentially expressed uTARs and labelling uTAR

We used the Wilcoxon Rank Sum test to identify differentially expressed genes and TARs in every cell cluster, filtered based on increased expression as compared to all other cells. Canonical marker gene analysis was performed to identify cell clusters in days 4 and 14 chicken heart, naked mole rat spleen, and gray mouse lemur lung datasets. Differential mouse spleen uTARs were filtered by being in at least 25% of cells in the cluster and more than 0.5 log-e fold change compared to the rest of the cells. Differential day 4 chicken uTARs were further filtered by being in at least 25% of cells in the cluster and more than 0.25 log-e fold change compared to the rest of the cells. Lemur lung differential uTARs were further filtered based on being in at least 50% of cells in the cluster and more than 0.75 log-e fold change compared to the rest of the cells. Naked mole rat differential uTARs were further filtered based on being in at least 50% of cells in the cluster and more than 0.5 log-e fold change compared to the rest of the cells. Sea urchin differential uTARs were further filtered based on being in at least 50% of cells in the cluster and more than 1 log-e fold change compared to the rest of the cells. We manually examined the coverage of filtered uTARs, selected regions with coverage peaks, and used BLASTn to find nucleotide sequence homology with the nucleotide collection database to label differential uTAR features.

Pseudo-bulk versus single-cell derived uTAR comparison

uTAR pseudo-bulk coverage was calculated using bedtools coverage on the merged alignment files for each experiment. uTAR PC loadings were calculated based on the sum of absolute PC loading values in the first 5 PCs based on uTAR PC analysis.

10X Genomics Visium spatial transcriptome sample and library generation

Whole hearts were dissected, placed in sterile Hank’s Balanced Salt Solution, and perfused through the apex to remove blood. Fresh samples were embedded in Optimal Cutting Compound (OCT) and frozen with liquid-nitrogen-cooled isopentane before sectioning into 10 μm slices using Thermo Scientific™ CryoStar NX50 cryostat. Sections were mounted on −20°C cooled Visium slides. Five sections were mounted for day 4, four sections for day 7, two sections for day 10, and one section for day 14 post fertilization. cDNA libraries were generated using 10x Genomics Visium Spatial Gene Expression 3’ Library Construction V1 Kit. Haemotoxylin and Eosin stained heart sections were imaged using Zeiss PALM MicroBeam laser capture microscope and images were processed using Fiji from ImageJ. Illumina NextSeq 500/550 was used to sequence the cDNA libraries with 150 cycle high output kits (Read 1 = 120, Read 2 = 5, Index 1 = 14 and Index 2 = 8). Sequencing files were processed using 10X Genomics Space Ranger pipeline. TAR annotations were generated by running groHMM on the combined Space Ranger derived alignment files across the 4 time points. TAR and GRCg6a Ensembl v96 annotations were combined to create one set of genome annotation. Spatially tagged TAR and gene expression matrices were generated with the combined annotation set following the Space Ranger pipeline.

Spatial transcriptome processing and visualization

We used Seurat spatial transcriptome tools to visualize spatially tagged expression data. Feature expression matrices were log-e normalized before further processing. We identified spatial uTARs that overlap with differentially expressed scRNA-seq uTARs in chromosome position and strandedness. If several spatial uTARs overlap with a scRNA-seq uTAR, we visualized and calculated spatial correlations based on the spatial uTAR with the highest total expression.

TAR-scRNA-seq tool and scripts

https://github.com/fw262/TAR-scRNA-seq

DATA AVAILABILITY

The chicken related sequencing data discussed in this publication have been deposited in NCBI’s Gene Expression Omnibus and are accessible through GEO Series accession number GSE149457. H&E stained tissue images for spatial RNA-seq datasets have been made available through github.

AUTHOR CONTRIBUTIONS

M.F.Z.W, J.T.B, C.G.D, and I.D.V designed the study. M.F.Z.W, M.M, and G.J.S carried out the experiments. M.F.Z.W, M.M, S.C, and D.M analyzed the data. M.F.Z.W and I.D.V wrote the manuscript. All authors provided feedback and comments.

COMPETING INTERESTS

The authors declare no competing interests.

ACKNOWLEDGEMENTS

We thank P. Schweitzer and colleagues at the Cornell Biotechnology Resource Center (BRC) for help with sequencing assays. We thank Phillip S. Burnham, Hao Shi, Alexandre P. Cheng, Adrienne Chang, Benjamin Grodner, Olga Botvinnik, Camille Sophie Ezran, Angela Wu, Mark Krasnow, and the Lemur Cell Atlas Consortium for discussions and feedback. This work was supported by R33CA235302 (to I.D.V.), R21AI133331 (to I.D.V.), DP2AI138242 (to I.D.V.), and a National Sciences and Engineering Research Council of Canada fellowship PGS-D2 (to M.F.Z.W).

REFERENCES

1.↵
Rosenberg, A. B. et al. Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. Science 360, 176–182 (2018).
OpenUrl Abstract/FREE Full Text
2.
Gierahn, T. M. et al. Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput. Nature Methods 14, 395–398 (2017).
OpenUrl
3.
Jaitin, D. A. et al. Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science 343, 776–779 (2014).
OpenUrl Abstract/FREE Full Text
4.
Klein, A. M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).
OpenUrl CrossRef PubMed
5.
Cao, J. et al. Comprehensive single-cell transcriptional profiling of a multicellular organism. Science 357, 661–667 (2017).
OpenUrl Abstract/FREE Full Text
6.
Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–1098 (2013).
OpenUrl CrossRef PubMed Web of Science
7.↵
Tabula Muris Consortium et al. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562, 367–372 (2018).
OpenUrl CrossRef PubMed
8.
González-Silva, L., Quevedo, L. & Varela, I. Tumor Functional Heterogeneity Unraveled by scRNA-seq Technologies. Trends in Cancer 6, 13–19 (2020).
OpenUrl
9.↵
Xue, Z. et al. Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing. Nature 500, 593–597 (2013).
OpenUrl CrossRef PubMed Web of Science
10.↵
Liu, S. & Trapnell, C. Single-cell transcriptome sequencing: recent advances and remaining challenges. F1000Res 5, (2016).
11.
Menon, M. et al. Single-cell transcriptomic atlas of the human retina identifies cell types associated with age-related macular degeneration. Nat Commun 10, 4902 (2019).
OpenUrl CrossRef PubMed
12.
Kanton, S. et al. Organoid single-cell genomic atlas uncovers human-specific features of brain development. Nature 574, 418–422 (2019).
OpenUrl CrossRef PubMed
13.
HuBMAP Consortium. The human body at cellular resolution: the NIH Human Biomolecular Atlas Program. Nature 574, 187–192 (2019).
OpenUrl CrossRef PubMed
14.↵
Saunders, A. et al. Molecular Diversity and Specializations among the Cells of the Adult Mouse Brain. Cell 174, 1015–1030.e16 (2018).
OpenUrl CrossRef PubMed
15.↵
Ensembl annotation. https://useast.ensembl.org/info/genome/genebuild/genome_annotation.html.
16.↵
Chae, M., Danko, C. G. & Kraus, W. L. groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data. BMC Bioinformatics 16, 222 (2015).
OpenUrl CrossRef PubMed
17.↵
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nature Biotechnology 36, 411–420 (2018).
OpenUrl CrossRef PubMed
18.↵
Stuart, T. et al. Comprehensive Integration of Single-Cell Data. Cell 177, 1888–1902.e21 (2019).
OpenUrl CrossRef PubMed
19.↵
pbmc4k -Datasets -Single Cell Gene Expression -Official 10x Genomics Support. https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/pbmc4k.
20.↵
Hilton, H. G. et al. Single-cell transcriptomics of the naked mole-rat reveals unexpected features of mammalian immunity. PLOS Biology 17, e3000528 (2019).
OpenUrl
21.↵
Foster, S., Teo, Y. V., Neretti, N., Oulhen, N. & Wessel, G. M. Single cell RNA-seq in the sea urchin embryo show marked cell-type specificity in the Delta/Notch pathway. Molecular Reproduction and Development 86, 931–934 (2019).
OpenUrl CrossRef
22.↵
Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nature Biotechnology (2018) doi:10.1038/nbt.4314.
OpenUrl CrossRef PubMed
23.↵
McInnes, L. & Healy, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv:1802.03426 [cs, stat] (2018).
24.↵
Johnson, M. et al. NCBI BLAST: a better web interface. Nucleic Acids Res 36, W5–W9 (2008).
OpenUrl CrossRef PubMed Web of Science
25.↵
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
OpenUrl CrossRef PubMed Web of Science
26.↵
Board, P. G. & Menon, D. Glutathione transferases, regulators of cellular metabolism and physiology. Biochim. Biophys. Acta 1830, 3267–3288 (2013).
OpenUrl CrossRef PubMed
27.↵
Wickramasinghe, V. O. et al. Regulation of constitutive and alternative mRNA splicing across the human transcriptome by PRPF8 is determined by 5’ splice site strength. Genome Biol. 16, 201 (2015).
OpenUrl CrossRef PubMed
28.↵
Mura, M. et al. Identification and angiogenic role of the novel tumor endothelial marker CLEC14A. Oncogene 31, 293–305 (2012).
OpenUrl CrossRef PubMed
29.↵
Rakhmanov, M. et al. High levels of SOX5 decrease proliferative capacity of human B cells, but permit plasmablast differentiation. PLoS ONE 9, e100328 (2014).
OpenUrl
30.↵
Zhang, D. & Liu, S. SOX5 promotes epithelial-mesenchymal transition in osteosarcoma via regulation of Snail. J BUON 22, 258–264 (2017).
OpenUrl
31.↵
Alfayez, M., Vishnubalaji, R. & Alajez, N. M. Runt-related Transcription Factor 1 (RUNX1T1) Suppresses Colorectal Cancer Cells Through Regulation of Cell Proliferation and Chemotherapeutic Drug Resistance. Anticancer Res. 36, 5257–5263 (2016).
OpenUrl Abstract/FREE Full Text
32.↵
Shi, J. et al. Long non-coding RNA RUNX1-IT1 plays a tumour-suppressive role in colorectal cancer by inhibiting cell proliferation and migration. Cell Biochem. Funct. 37, 11–20 (2019).
OpenUrl
33.↵
Schenk, R., Jenke, A., Zilbauer, M., Wirth, S. & Postberg, J. H3.5 is a novel hominid-specific histone H3 variant that is specifically expressed in the seminiferous tubules of human testes. Chromosoma 120, 275–285 (2011).
OpenUrl CrossRef PubMed
34.↵
Shiraishi, K. et al. Roles of histone H3.5 in human spermatogenesis and spermatogenic disorders. Andrology 6, 158–165 (2018).
OpenUrl
35.↵
Mahauad-Fernandez, W. D. et al. BST-2 promotes survival in circulation and pulmonary metastatic seeding of breast cancer cells. Sci Rep 8, 17608 (2018).
OpenUrl
36.↵
Tiwari, R., de la Torre, J. C., McGavern, D. B. & Nayak, D. Beyond Tethering the Viral Particles: Immunomodulatory Functions of Tetherin (BST-2). DNA Cell Biol. 38, 1170–1177 (2019).
OpenUrl
37.↵
Cheng, C.-M., Liu, F., Li, J.-Y. & Song, Q.-Y. DUSP1 promotes senescence of retinoblastoma cell line SO-Rb5 cells by activating AKT signaling pathway. Eur Rev Med Pharmacol Sci 22, 7628–7632 (2018).
OpenUrl
38.↵
Teng, F. et al. DUSP1 induces apatinib resistance by activating the MAPK pathway in gastric cancer. Oncol. Rep. 40, 1203–1222 (2018).
OpenUrl
39.↵
10x Genomics Support -Official 10x Genomics Support. https://support.10xgenomics.com/docs/bamtofastq.
40.↵
Macosko, E. Z. et al. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell 161, 1202–1214 (2015).
OpenUrl CrossRef PubMed
41.↵
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
OpenUrl CrossRef PubMed Web of Science
42.↵
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
OpenUrl CrossRef PubMed Web of Science

View the discussion thread.

Posted August 03, 2020.

Download PDF

Supplementary Material

Citation Tools

Subject Area

Bioinformatics

Subject Areas

All Articles

Animal Behavior and Cognition (5216)
Biochemistry (11753)
Bioengineering (8754)
Bioinformatics (29205)
Biophysics (14975)
Cancer Biology (12102)
Cell Biology (17414)
Clinical Trials (138)
Developmental Biology (9423)
Ecology (14185)
Epidemiology (2067)
Evolutionary Biology (18309)
Genetics (12246)
Genomics (16805)
Immunology (11870)
Microbiology (28098)
Molecular Biology (11598)
Neuroscience (60979)
Paleontology (452)
Pathology (1871)
Pharmacology and Toxicology (3238)
Physiology (4960)
Plant Biology (10427)
Scientific Communication and Education (1683)
Synthetic Biology (2886)
Systems Biology (7341)
Zoology (1651)

[1] 1.↵
Rosenberg, A. B. et al. Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. Science 360, 176–182 (2018).
OpenUrl Abstract/FREE Full Text

[2] 2.
Gierahn, T. M. et al. Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput. Nature Methods 14, 395–398 (2017).
OpenUrl

[3] 3.
Jaitin, D. A. et al. Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science 343, 776–779 (2014).
OpenUrl Abstract/FREE Full Text

[4] 4.
Klein, A. M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).
OpenUrl CrossRef PubMed

[5] 5.
Cao, J. et al. Comprehensive single-cell transcriptional profiling of a multicellular organism. Science 357, 661–667 (2017).
OpenUrl Abstract/FREE Full Text

[6] 6.
Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–1098 (2013).
OpenUrl CrossRef PubMed Web of Science

[7] 7.↵
Tabula Muris Consortium et al. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562, 367–372 (2018).
OpenUrl CrossRef PubMed

[8] 8.
González-Silva, L., Quevedo, L. & Varela, I. Tumor Functional Heterogeneity Unraveled by scRNA-seq Technologies. Trends in Cancer 6, 13–19 (2020).
OpenUrl

[9] 9.↵
Xue, Z. et al. Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing. Nature 500, 593–597 (2013).
OpenUrl CrossRef PubMed Web of Science

[10] 10.↵
Liu, S. & Trapnell, C. Single-cell transcriptome sequencing: recent advances and remaining challenges. F1000Res 5, (2016).

[11] 11.
Menon, M. et al. Single-cell transcriptomic atlas of the human retina identifies cell types associated with age-related macular degeneration. Nat Commun 10, 4902 (2019).
OpenUrl CrossRef PubMed

[12] 12.
Kanton, S. et al. Organoid single-cell genomic atlas uncovers human-specific features of brain development. Nature 574, 418–422 (2019).
OpenUrl CrossRef PubMed

[13] 13.
HuBMAP Consortium. The human body at cellular resolution: the NIH Human Biomolecular Atlas Program. Nature 574, 187–192 (2019).
OpenUrl CrossRef PubMed

[14] 14.↵
Saunders, A. et al. Molecular Diversity and Specializations among the Cells of the Adult Mouse Brain. Cell 174, 1015–1030.e16 (2018).
OpenUrl CrossRef PubMed

[15] 15.↵
Ensembl annotation. https://useast.ensembl.org/info/genome/genebuild/genome_annotation.html.

[16] 16.↵
Chae, M., Danko, C. G. & Kraus, W. L. groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data. BMC Bioinformatics 16, 222 (2015).
OpenUrl CrossRef PubMed

[17] 17.↵
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nature Biotechnology 36, 411–420 (2018).
OpenUrl CrossRef PubMed

[18] 18.↵
Stuart, T. et al. Comprehensive Integration of Single-Cell Data. Cell 177, 1888–1902.e21 (2019).
OpenUrl CrossRef PubMed

[19] 19.↵
pbmc4k -Datasets -Single Cell Gene Expression -Official 10x Genomics Support. https://support.10xgenomics.com/single-cell-gene-expression/datasets/2.1.0/pbmc4k.

[20] 20.↵
Hilton, H. G. et al. Single-cell transcriptomics of the naked mole-rat reveals unexpected features of mammalian immunity. PLOS Biology 17, e3000528 (2019).
OpenUrl

[21] 21.↵
Foster, S., Teo, Y. V., Neretti, N., Oulhen, N. & Wessel, G. M. Single cell RNA-seq in the sea urchin embryo show marked cell-type specificity in the Delta/Notch pathway. Molecular Reproduction and Development 86, 931–934 (2019).
OpenUrl CrossRef

[22] 22.↵
Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nature Biotechnology (2018) doi:10.1038/nbt.4314.
OpenUrl CrossRef PubMed

[23] 23.↵
McInnes, L. & Healy, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv:1802.03426 [cs, stat] (2018).

[24] 24.↵
Johnson, M. et al. NCBI BLAST: a better web interface. Nucleic Acids Res 36, W5–W9 (2008).
OpenUrl CrossRef PubMed Web of Science

[25] 25.↵
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
OpenUrl CrossRef PubMed Web of Science

[26] 26.↵
Board, P. G. & Menon, D. Glutathione transferases, regulators of cellular metabolism and physiology. Biochim. Biophys. Acta 1830, 3267–3288 (2013).
OpenUrl CrossRef PubMed

[27] 27.↵
Wickramasinghe, V. O. et al. Regulation of constitutive and alternative mRNA splicing across the human transcriptome by PRPF8 is determined by 5’ splice site strength. Genome Biol. 16, 201 (2015).
OpenUrl CrossRef PubMed

[28] 28.↵
Mura, M. et al. Identification and angiogenic role of the novel tumor endothelial marker CLEC14A. Oncogene 31, 293–305 (2012).
OpenUrl CrossRef PubMed

[29] 29.↵
Rakhmanov, M. et al. High levels of SOX5 decrease proliferative capacity of human B cells, but permit plasmablast differentiation. PLoS ONE 9, e100328 (2014).
OpenUrl

[30] 30.↵
Zhang, D. & Liu, S. SOX5 promotes epithelial-mesenchymal transition in osteosarcoma via regulation of Snail. J BUON 22, 258–264 (2017).
OpenUrl

[31] 31.↵
Alfayez, M., Vishnubalaji, R. & Alajez, N. M. Runt-related Transcription Factor 1 (RUNX1T1) Suppresses Colorectal Cancer Cells Through Regulation of Cell Proliferation and Chemotherapeutic Drug Resistance. Anticancer Res. 36, 5257–5263 (2016).
OpenUrl Abstract/FREE Full Text

[32] 32.↵
Shi, J. et al. Long non-coding RNA RUNX1-IT1 plays a tumour-suppressive role in colorectal cancer by inhibiting cell proliferation and migration. Cell Biochem. Funct. 37, 11–20 (2019).
OpenUrl

[33] 33.↵
Schenk, R., Jenke, A., Zilbauer, M., Wirth, S. & Postberg, J. H3.5 is a novel hominid-specific histone H3 variant that is specifically expressed in the seminiferous tubules of human testes. Chromosoma 120, 275–285 (2011).
OpenUrl CrossRef PubMed

[34] 34.↵
Shiraishi, K. et al. Roles of histone H3.5 in human spermatogenesis and spermatogenic disorders. Andrology 6, 158–165 (2018).
OpenUrl

[35] 35.↵
Mahauad-Fernandez, W. D. et al. BST-2 promotes survival in circulation and pulmonary metastatic seeding of breast cancer cells. Sci Rep 8, 17608 (2018).
OpenUrl

[36] 36.↵
Tiwari, R., de la Torre, J. C., McGavern, D. B. & Nayak, D. Beyond Tethering the Viral Particles: Immunomodulatory Functions of Tetherin (BST-2). DNA Cell Biol. 38, 1170–1177 (2019).
OpenUrl

[37] 37.↵
Cheng, C.-M., Liu, F., Li, J.-Y. & Song, Q.-Y. DUSP1 promotes senescence of retinoblastoma cell line SO-Rb5 cells by activating AKT signaling pathway. Eur Rev Med Pharmacol Sci 22, 7628–7632 (2018).
OpenUrl

[38] 38.↵
Teng, F. et al. DUSP1 induces apatinib resistance by activating the MAPK pathway in gastric cancer. Oncol. Rep. 40, 1203–1222 (2018).
OpenUrl

[39] 39.↵
10x Genomics Support -Official 10x Genomics Support. https://support.10xgenomics.com/docs/bamtofastq.

[40] 40.↵
Macosko, E. Z. et al. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell 161, 1202–1214 (2015).
OpenUrl CrossRef PubMed

[41] 41.↵
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
OpenUrl CrossRef PubMed Web of Science

[42] 42.↵
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
OpenUrl CrossRef PubMed Web of Science