Abstract
Oogenesis is a complex developmental process that involves spatiotemporally regulated coordination between the germline and supporting, somatic cell populations. This process has been modelled extensively using the Drosophila ovary. While different ovarian cell types have been identified through traditional means, the large-scale expression profiles underlying each cell type remain unknown. Using single-cell RNA sequencing technology, we have built a transcriptomic dataset for the adult Drosophila ovary and connected tissues. This dataset captures the entire transcriptional trajectory of the developing follicle cell population over time. Our findings provide detailed insight into processes such as cell-cycle switching, migration, symmetry breaking, nurse cell engulfment, egg-shell formation, and signaling during corpus luteum formation, marking a newly identified oogenesis-to-ovulation transition. Altogether, these findings provide a broad perspective on oogenesis at a single-cell resolution while revealing new genetic markers and fate-specific transcriptional signatures to facilitate future studies.
Introduction
The adult Drosophila ovary is a versatile model used many biological studies. With powerful genetic tools available in Drosophila, studies of oogenesis have provided mechanistic insight into topics such as stem cell niche regulation [31, 60, 71, 76, 101], cell differentiation [3, 52], cell cycle and size control [9, 17], epithelial morphogenesis [46, 92, 100], cell migration [45, 66], tissue repair and homeostasis [88], etc. The success of this system as a developmental model is also due to the structure of the fly ovary, where eggs progress in sequence and many rounds of oogenesis occur simultaneously. This provides a unique advantage over other systems where temporal resolution and replicative power can be achieved easily within a single ovary.
A female fly has a pair of ovaries that are connected to the oviduct and held together by muscles known as the peritoneal sheath. Each ovary is made up of developmental units called ovarioles, which are individually sheathed within the musculature known as the epithelial sheath. Oogenesis occurs simultaneously within each of the 16-18 ovarioles, starting from stem cells at the anterior tip to the fully-developed eggs at the posterior end. Throughout oogenesis, the developing egg is supported by the germline-derived nurse cells, and the somatic follicular epithelium (made up of follicle cells). Together, the germline and the follicle cells form individual units called egg chambers. Egg chamber development is subdivided into early (1-6), middle (7-10A), and late (10B-14) stages based on mitotic, endocycle, and gene amplification cell-cycle programs of the follicle cells, respectively [47]. During ovulation, mature eggs break free from the epithelium and pass into the uterus through the oviduct. The epithelial layer remains in the ovary, forming a similar structure found in mammals, known as the corpus luteum [23].
To better understand how oogenesis is regulated at the cellular level, we performed single-cell RNA sequencing (scRNA-seq) on these ovarian cell types and uncovered novel gene expression patterns throughout oogenesis. With a special focus on the follicle cell trajectory we also described the major transcriptomic programs underlying the early, middle, and late stages of oogenesis. We also report a newly identified transcriptional shift in late-staged follicle cells (termed pre-corpus luteum cells) which begin upregulating ovulation-related genes.
Materials and methods
Experimental Model
Fly lines used for ScRNA-seq
All fly stocks and crosses were maintained at room temperature (23°C) and fed a yeast based medium. To construct the scRNA-seq dataset, w− flies (BL#3605) were used, a common genetic background used in many studies [48].
Fly lines used in experimental validation of cluster markers
We used a variety of publicly available lines from Bloomington Stock Center to experimentally validate expression patterns of select genes from the scRNA-seq dataset. These lines fall into two categories: those with fluorescently tagged proteins under the control of a native promoter (either MiMIC-based RMCE [96] or protein trap [13]) and those expressing T2A-Gal4 (carrying either CRISPR-mediated insertions of T2A-Gal4 [57] or RMCE-mediated swap-ins of T2A-Gal4 [27]) driving UAS-GFP (BL#4775) or UAS-RFP (BL#31417) as a marker.
The GFP-tagged lines used in this study are Atf3:GFP (BL#42263), Ilp8:GFP (BL#33079), Past1:GFP (BL#51521), Glut4EF:GFP (BL#60555), abd-A:GFP (BL#68187), Chrac-16:GFP (BL#56160), shep:GFP (BL#61769), AdenoK:GFP (BL#56160), Fkbp1:GFP (BL#66358), mub:GFP (BL#51574), mnb:GFP (BL#66769), Gp210:GFP (BL#61651), Fpps:GFP (BL#51527), HmgD:GFP (BL#55827), sli:GFP (BL#64472), Nrx-IV:GFP (BL#50798), CG14207:GFP (BL#60226), D1:GFP (BL#66454), jumu:GFP (BL#59764), hdc:GFP (BL#59762), sm:GFP (BL#59815), Men:GFP (BL#61754), Sap-r:GFP (BL#63201), GILT1:GFP (BL#51543), Cp1:GFP (BL#51555). The T2A-Gal4 lines used in this study are Ance-Gal4 (BL#76676), FER-Gal4 (BL#67448), wb-Gal4 (BL#76189), stx-Gal4 (BL#77769), vir-1-Gal4 (BL#65650).
We also used Diap1:GFP, a kind gift from Jin Jiang Lab [105].
Immunofluorescence and imaging
Ovaries and associated tissue were dissected in PBS, fixed for 15 minutes in 4% formaldehyde, washed 3 times in PBT, and then stained with DAPI (Invitrogen, 1:1000) to label nuclei. Samples were then mounted on slides in an 80% glycerol mounting solution. All images were captured using the Zeiss LSM 800 confocal microscope and associated Zeiss microscope software (ZEN blue).
ScRNA-seq sample preparation
Dissociation and filtration of single cells
To maximize sampling genetic diversity between individuals, we dissected ovarian tissue from 50 adult flies. It is technically challenging to separate the ovaries from surrounding and interconnected tissues (i.e. fat body, muscle sheath, and oviduct) without damaging the ovarian cells. Thus, in order to minimize damage or death to ovarian cell types of interest, we elected to include these surrounding cell types in our analysis.
Female flies were selected on the day of eclosion and maintained at 25°C with access to males and yeast supplement for 3 days (a common experimental condition in many studies). Flies were then dissected in complete medium (Grace’s Insect Basal Medium supplemented with 15% FBS). To prevent cell clumping, ovaries were transferred to a tube containing 300 µl EBSS (no calcium, magnesium, and phenol red) and gently washed for 2 minutes. The EBSS was then removed and the tissue was dissociated in 100 µl Papain (50 U/mL in EBSS and previously heat activated in 37°C for 15 minutes) for 30 minutes. The suspension was mechanically dissociated every 3 minutes by gentle pipetting up and down. To quench the digestion, 500 µl complete medium was added to dissociated cells. The suspension was then passed through a 40 µl sterile cell strainer and centrifuged for 10 minutes at 700 RCF to remove large, undissociated eggs (with eggshell) and debris. This also filtered out larger germline cells which increase dramatically in size around stage 9 [53]. Supernatant was removed and single cells were re-suspended in 100 µl. Cell viability was assayed using Trypan Blue and estimates of cell concentration were made using a hemocytometer. Cells were then further diluted to an approximate, final concentration of 2,000 cells/µl according to 10X Genomics recommendations.
10X Genomics library preparation
Single-cell libraries were prepared using the Single Cell 3’ Library & Gel Bead Kit v2 and Chip Kit according to the recommended 10X Genomics protocol. Single cell suspension was loaded onto the Chromium Controller (10X Genomics). Library quantification assays and quality check analysis was performed using the 2100 Bioanalyzer instrument (Agilent Technologies). The library samples were then diluted to a 10nM concentration and loaded onto two lanes of the NovaSeq 6000 (Illumina) instrument flow cell for a 100-cycle sequencing run. A total of 429,855,892 reads were obtained for the sample, with 28,995 mean reads per cell.
Quantification and statistical analysis
Pre-processing Chromium single-cell RNA-seq output
The raw sequencing data for the 10X Genomics Chromium single-cell 3’ RNA-seq library were initially processed using Cell Ranger (version 3.0.0), the recommended analysis pipeline from the Chromium single-cell gene expression software suite. The reference index for Cell Ranger was built using the Drosophila melanogaster Release 6 reference genome assembly [80] made available on the Ensembl genome database. The cellranger count pipeline for alignment, filtering, barcode counting and UMI counting was used to generate the multidimensional feature-barcode matrix of 14,825 cells (S1 Fig).
The Cell Ranger output was used for further processing using the R package Seurat (v2.3.4) [12, 84]. As part of this processing, multiplet cells (those with less than 775 genes expressed per cell; setting a maximum of 2200 genes, and 18,000 UMIs per cell) and dead cells (greater than 1% mitochondrial gene expression) were filtered from the dataset (S1 Fig). Feature counts were log-normalized and scaled using default options (S1 Fig). Unwanted sources of intercellular variability were removed by regressing possible variation driven by number of UMIs and mitochondrial gene expression during data scaling. Scores for the expression of an expansive list of Drosophila G2/M and S phase genes (S2 File) were assigned to each cell which enabled the calculation of the difference between G2/M and S phase scores, using the function CellCycleScoring. This cell cycle score was then regressed from the downstream analysis to maintain the signals separating dividing and non-dividing cells but eliminating subtle differences among proliferative cells. Based on this score, the cells were assigned a cell cycle phase (S2 Fig). To assemble these cells into transcriptomic clusters using meaningful features, the number of random variables in our dataset was reduced by obtaining sets of principal component (PC) vectors. Significant PCs were obtained by performing Principal Component Analysis (PCA), using 897 highly variable genes as input. The first 30 significant PCs were selected based on the Elbow method as input for UMAP clustering using default parameters. Altogether, these pre-processing steps resulted in a primary UMAP of 12,671 cells (S1 Fig).
Manual removal of contaminated cells using biological markers
The clusters obtained in this primary UMAP construction were further processed for ambient RNA contamination removal (cleaned) based on aberrant gene expression patterns. Since we did not find any unique cluster for ovary/oviduct associating neuronal cell types (expressing commonly known neuronal cell markers elav), all cells expressing elav were considered contaminant and removed from the dataset. Similarly, we cleaned the germline clusters by removing cells that expressed somatic cell markers: dec-1, Yp1/2/3, psd, Vml, Vm32E, Vm26Ab, and tj; adipocyte marker: Ilp6; muscle cell markers: Zasp66 and Mp20; and hemocyte marker: Hml. We cleaned the early somatic, polar, stalk, and mitotic follicle cell clusters by removing cells expressing germline cell markers: osk and bru1; mid-late somatic cell markers: dec-1, Vm32E, Vm26Ab, and psd; and hemocyte marker: Hml. We cleaned the mid-late clusters for cells expressing germline markers: osk and bru-1; muscle cell markers: Zasp66 and Mp20; adipocyte marker: Ilp6; and hemocyte marker: Hml. We cleaned the muscle cell clusters by removing cells expressing germline cell markers: osk and yl; somatic cell markers: tj, Yp1/2/ 3, Vm32E, Vm26Ab, dec-1, psd, and Vml; and hemocyte marker: Hml. We cleaned the adipocyte cluster by removing cells expressing somatic cell markers: tj, dec-1, psd, Vml, Vm32E, Yp1/2/3, and Vm26Ab; germline cell markers: osk and yl; muscle cell markers: Zasp66 and Mp20; and hemocyte marker: Hml. A cut-off value of greater than 2 logFC was used to remove the contaminant cells. This manual cleaning strategy resulted in an increased resolution in the total number of highly variable genes (limits: >0.4 dispersion; >0.01 and <3 average expression) from 897 to 1075 which were then used as input for PCA on the cleaned dataset. The final dataset of high quality cells consisted of 7,053 cells and 11,782 genes (S1 Fig).
Cluster Validation of Replicate Data by Canonical Correlation Analysis (CCA)
The final 7,053-cells dataset was further compared to a 1,521-cell biological replicate dataset to assess the fidelity of the clustering (especially the trajectory of follicle cell clusters). This replicate dataset was derived from an original unprocessed dataset of 2,148 cells with 11,791 genes that was passed through a less stringent filtering criteria (due to the low number of cells) of 250 genes per cells as a lower threshold, and a higher threshold of 900 genes per cell, 2000 UMIs and 1% mitochondrial gene expression. The two datasets were aligned using 2,926 genes with the highest dispersion in both datasets. To detect common sources of variation between the two datasets, Canonical Correlation Analysis (CCA) was performed and 75 correlation vectors were used for downstream clustering. Upon plotting the UMAP using both the datasets, we were able to validate the perceived trajectory of the follicle cells. All follicular-cell states and clusters obtained in the 7,053-cells dataset were recapitulated in the UMAP using both replicates. We only used the replicate datasets to validate the clustering analysis and did not use this dataset for further downstream analysis since the cell sampling varied and we were unable to achieve a comparable sequencing depth (median genes per cell for the 1,521-cells dataset is 404) between the two datasets. The larger dataset (replicate 2) was used for all downstream analysis (S1 Fig).
UMAP clustering analysis
The 7,053 cells dataset (replicate 2) was log-normalized and scaled again using default parameters. The 1075 highly variable genes were selected as input for PCA and the first 75 PCs were selected to build the Shared Nearest-Neighbor (SNN) graph for clustering. To assemble cells into transcriptomic clusters, graph-based clustering method using the SLM algorithm [8] was performed in Seurat. We chose to plot clusters on a UMAP (Uniform Manifold Approximation and Projection) because this dimensionality reduction technique arranges cells in a developmental time-course in a meaningful continuum of clusters along a trajectory [6]. A number of resolution parameters, ranging from 0.5 to 6 were tested which resulted in 14 to 46 clusters. The relationship between clusters in each resolution was assessed using the R package clustree [103], based off of which a resolution of 6 was selected to obtain an initial number of 46 clusters (S2 Fig). Differentially expressed markers specific to each cluster were identified using the function FindAllMarkers (S3 File) and clusters with no unique markers were merged with their nearest neighbor after careful consideration of the differences in average expression pattern in each cluster. The final number of clusters was decided based on the uniqueness of observed and expected gene markers and the relative relationships with other clusters (S2 Fig). Cell type identities were then assigned to each cluster using known (S1 File) and experimentally validated markers.
Unsupervised re-clustering of cell subsets using Monocle (v2)
Smaller subsets of cells from the entire dataset were selected using the SubsetData function in Seurat. These subsets were re-clustered and imported into Monocle (v2) [74, 93] for further downstream analysis using the importCDS() function, with the parameter import_all set to TRUE to retain cell-type identity in Seurat for each cell. The raw UMI counts for these subsetted datasets were assumed to be distributed according to a negative binomial distribution and were normalized as recommended by the Monocle (v2) pipeline. The number of dimensions used to perform dimensionality reduction was chosen using the Elbow method. The cells were clustered in an unsupervised manner using the density peak algorithm where the number of clusters was set for an expected number of cell types (as in for early follicle cell differentiation states) or cell states (as in mitotic-endocycle transition state, along with mitotic and endocycling follicle cells). The number of cell clusters, in case of the “germline cells” subset and the “oviduct cells” and “muscle cells” subset was chosen in an unsupervised manner based on significant rho (local density) and delta (distance of current cell to another cell of higher density) threshold values.
Pseudotime Inference Analysis and Identification of Lineage-Specific Genes of Interest
Pseudotime inference analysis on known cell differentiation programs of oogenesis was performed using Monocle (v2). Cells were ordered in an unsupervised manner on a pseudotemporal vector based on genes that are differentially expressed over pseudotime between cell type identities assigned in Seurat or cell states identified as clusters in Monocle, depending on the clustering as mentioned in the previous section. Lowly expressed aberrant genes were removed from the ordering genes. Multiple trajectories were generated by ordering the cells using different numbers of statistically significant (q<0.05) genes that are expressed in a minimum number of predetermined cells, and the efficacy of the trajectories was tested with validated marker gene expression. The trajectory that reflected the most accurate cell state changes was then selected for downstream analysis. To assess transcriptional changes across a branching event, as seen in the early somatic and the polar/stalk trajectories, the function BEAM was used to analyze binary decisions of cell differentiation processes across a branch.
Gene ontology (GO) term enrichment Analysis
Genes were selected for downstream GO term enrichment analysis from the pseudotemporal heatmap by cutting the dendrogram that hierarchically clustered the genes expressed in a similar pattern across pseudotime using the R based function cutree [7]. The web-based server g:Profiler [75] and PANTHER [65] were then used for functional enrichment analysis on the genes. A user threshold of p=0.05 was used for these analyses.
Results
ScRNA-seq identifies unique cell clusters and markers to assign cell type identities
We generated the scRNA-seq library from a cell suspension of freshly dissected ovaries (and connected tissues) from adult female flies (Fig 1A). Following library sequencing, extensive quality control, and cell type-specific marker validation, we recovered 7,053 high-quality cells and clustered them into 32 cell-type identities (Fig 1B, S1 Fig and S2 Fig). This dataset has an average of ∼7,100 UMIs and ∼1,300 genes per cell, with each cell type having variable levels of mRNA content and gene expression (Fig 1C and 1D). We plotted this dataset on a scale of two primary axes for visualization using Uniform Manifold Approximation and Projection (UMAP) for dimension reduction of the cell/gene expression matrix (Fig 1B). This UMAP reflects the temporal and spatial development over the entirety of oogenesis, with connected ovarian clusters forming linear trajectories from stem cells onward, while surrounding tissues with non-temporally transitioning cells (muscle sheath, oviduct, adipocytes, and hemocytes) arranged in compact and isolated clusters (Fig 1B and S2 Fig).
Established cell-type and stage-specific markers were used to identify the majority of the clusters (S1 File and Fig 1D). For the remaining clusters with no known markers, we assigned identity using expression patterns of at least 7 newly validated genes (Fig 1D and 1E). Atf3 and abd-A were used to identify cell types such as stalk cells and oviduct cells. Past1 was used to identify the stretched cells, and Ilp8, Diap1, Glut4EF, and Ance were used to identify late-staged follicle cells. Most of the new markers have overlapping expression in multiple cell types. For example, Atf3, a transcription factor involved in lipid storage [78], marks the cap and terminal filament cells in the germarium, pre-follicle cells, stalk cells, and corpus luteum cells (Fig 1E). Similarly, some markers are expressed in cells across multiple timepoints, thus marking a single cell type in several clusters. For example, Past1, which encodes a plasma membrane protein known to interact with Notch, marks the stretched cell lineage in clusters 24, 25, and 26 [72]. Altogether, we were able to assign cell type identities for all clusters and identified 6,296 genes that show significant expression in different clusters. Among them, 828 are unique markers for clusters, that may be potentially specific to individual cell-types (S3 File).
The transcriptional patterns of early germline development
Oogenesis begins in the germarium at the most anterior tip of each ovariole. There, supported by somatic niche cells, two to three germline stem cells (GSCs) produce daughter cells which move posteriorly through the niche and differentiate into cystoblast cells (CCs) [16]. These cells undergo four more rounds of synchronized mitosis with incomplete cytokinesis, producing 16 interconnected germline cyst cells. One of these cells becomes a transcriptionally quiescent oocyte, while the others develop into nurse cells that synthesize and transport products into the oocyte through ring canals [22](Fig 2A).
The germline cells in our dataset were size selected through manual filtration (see Materials and methods), resulting in a sampling from GSCs to those in mid-oogenesis. These cells form a two-cluster trajectory (Fig 1B). The Germline 1 cluster includes cells in region 1 of the germarium (marked by bam expression) and the Germline 2 cluster includes cells from region 2 of the germarium and onward (marked by orb expression) [55, 63] (Fig 1D). The formation of the 16-cell cyst occurs at the boundary of germarium region 1 and 2. To uncover the underlying expression changes occurring at this time, we arranged the 112 germline cells on a pseudotemporal axis (Fig 2B) and plotted the differentially expressed genes along pseudotime. This revealed 50 genes that are expressed significantly before or after 16-cell cyst formation (Fig 2C). Gene Ontology (GO) enrichment of KEGG-pathway terms across pseudotime revealed the broad differences in activity before and after 16-cell cyst formation. Germline 1 cells are enriched for DNA replication and repair genes and Germline 2 cells switch to an enrichment in biosynthetic- and metabolic-pathway genes (Fig 2D). This is strikingly similar to the recent findings in a testis scRNA-seq study, which suggest an increase in mutational load in the immature germline cells of the testis and an early expression bias for DNA repair genes [97].
Selected germline-specific genes were experimentally validated and show varying expression patterns in the early stages of oogenesis (Fig 2E). Among these newly identified germline markers, specific expression of Mnb, a Ser/Thr protein kinase, in region 1 of the germarium and Mub, an mRNA splicing protein which appears only after 16-cell cyst formation, is of special interest [73, 89]. Other interesting expression patterns were identified in genes such as Fpps and Gp210, which briefly appear in the germarium, disappear for several for several for several aring, demonstraesng the dynamic regulation of early germline cell transcription.
Transcriptional trajectory of early somatic differentiation
The anterior region of the germarium houses somatic cells that include eight to ten terminal filament cells, a pair of cap cells, and the escort, or inner germarium sheath (IGS), cells. These collectively form the germline stem cell niche [31, 101] (Fig 3A). In the next region of the germarium is the somatic stem cell niche where two or more follicle stem cells (FSCs) differentiate to form the pre-follicle cells (pre-FCs) that envelope the germline cyst cells. As egg chambers pinch off from the germarium, pre-follicle cells at the two poles assume polar cell fate upon Notch activation. The anterior polar cells then promote the specification of the stalk cells through JAK/STAT signaling [3]. The polar and stalk cells cease division upon differentiation while the other follicle cells remain mitotically active [82].
Due to the unsupervised nature of our clustering, the somatic cells in the germarium are cluster together Fig 1B). This suggests a common transcriptomic signature which may be a response to the shared stem cell niche signaling. GO analysis for this group revealed an unexpected enrichment of nervous system development related genes, among more general development- and morphogenesis-related genes (Fig 3E).
To determine the transcriptional trajectory during early somatic differentiation, we arranged the 1,837-cell subset from clusters containing somatic cells of the germarium, polar cells, stalk cells, and mitotic follicle cells on a pseudotemporal axis (Fig 3B and 3C). This pseudotemporal trajectory establishes a divergence of the follicle cell lineage after FSC/pre-FC differentiation, as the branch for mitotic follicle cells separates out from a common branch for the polar/stalk cell lineage (Fig 3C). This trajectory is consistent with the notion that polar and stalk cells share a common precursor stage and share expression of certain commonly upregulated transcription factors as shown in other studies [18, 95].
Considering the importance of transcriptional regulation in differentiation, we analyzed the temporal patterns of highly expressed genes selected for their function as either transcription regulators (GO:0140110) or transcription factors (PC00218) (Fig 3C’). Plotting these genes across pseudotime revealed that the polar/stalk cell fates are transcriptionally dynamic, involving genes from many signaling pathways. We highlighted the genes involved in the MAPK pathway (Fig 3C’). Fewer transcription factors are expressed in the mitotic follicle cell lineage (Fig 3C’). Among them are the chromatin remodeling protein HmgD and its physical interactor Nacα, suggesting a role of epigenetic regulation in the proliferative effort of these cells [34, 39] (Fig 3C’, 3F and 3G). The mitotic follicle cell lineage also shows a differential enrichment of ribosomal genes (KEGG : 03010, Padj = 2.20e−49), probably to support the upregulation of biosynthetic processes to sustain rapid proliferation (Fig 3E).
Fate decisions during polar and stalk cell differentiation
To characterize the fate separation between polar and stalk cells, we excluded the mitotic follicle cells from further analysis. The resulting 479 cells were then ordered once again along a pseudotemporal axis (Fig 3D). The resulting trajectory shows that the polar cells differentiate earlier than the stalk cells, which is consistent with the evidence that chemical cues from polar cells initiate stalk cell differentiation [3, 95]. To further identify genes that regulate polar and stalk cell differentiation, we plotted the most significant (q < 1e−5) differentially-expressed genes between the two fates (Fig 3D’). GO analysis of biological functions in the polar cell branch revealed a remarkable number of genes involved in processes related to nervous system development, neurogenesis, and neuron differentiation, similar to neuron-related expression in somatic cells of the germarium (Fig 3E).
Many such genes (e.g., Fas2, bbg, kek1, sli, shg, brat, Fas3, and CG18208) produce junctional proteins (CG : 0005911, Padj = 5.563e−4) or proteins at the cell periphery (CG : 007194, Padj = 2.568e−2) (Fig 3D’) We validated the expression of sli, a novel polar cell marker, which is a secreted ligand for the Slit/Robo signaling pathway (Fig 3F and 3G). Another validated polar cell marker, Nrx-IV, is also associated with this pathway [5](Fig 3F and 3G). In addition to axon guidance in developing neurons, Slit/Robo has been implicated in the regulation of tissue barriers [98], which is consistent with the observation that polar cells are terminally differentiated barriers between each egg chamber unit and connecting stalk cells [37].
GO term analysis of stalk cell specific genes indicates a highly significant (q < 1e−5) upregulation of extracellular matrix genes (e.g. Col4a1, LanB1, and vkg) and cytoskeletal genes (e.g. LamC and βTub56D) that are also involved in muscle structure development (Fig 3D’ and 3E). Supporting this finding, we found a novel stalk cell marker CG14207, that is also expressed in epithelial muscle sheath (Fig 3F and 3G). Its human homolog, HspB8, interacts with Stv at the muscle sarcomere as part of a chaperone complex required for muscle Z-disc maintenance [2].
Catalytic genes upregulated during Mitosis-Endocycle transition of follicle cells
The transition between early and middle oogenesis (stages 6-7), occurs when the germline cells upregulate the ligand Dl, activating Notch signaling in the follicle cells, which initiates a mitosis-endocycle (M/E) switch [26] (Fig 4A).
To understand the regulation of the M/E switch at the single-cell level, we re-clustered the 2,691 follicle cells from clusters 7, 8, and 9 and arranged them across pseudotime (Fig 4B). Known Notch targets were used to validate cluster identity: ct and CycB in mitotic cells, peb in endocycling cells [85, 86], and all three in transitioning cells (Fig 4E). Pseudotime analysis revealed a linear arrangement for genes that change expression levels during the M/E switch. We validated some of these newly identified genes. For example, D1, jumu, and hdc, are down-regulated, while Men and sm, are upregulated in post-mitotic follicle cells (Fig 4F). The NADP[+] reducing enzyme, Men, is upregulated significantly in the anterior follicle cells and has a membrane localization. Sm, a member of the heterogeneous ribonucleoprotein complex is of special interest given its ability to regulate Notch activity during wing development [50]. Its enrichment in endocycling follicle cells suggests a potential role for sm in Notch-mediated M/E switch. Noticeably, upon GO term enrichment analysis of all significantly expressed genes that change as a function of pseudotime during the M/E switch, we found 43 genes with catalytic activity (GO:0003824) (Fig 4C). Enriched KEGG-pathway-related terms reveal an expression bias for proliferation and DNA repair associated genes in mitotic follicle cells, whereas endocycling cells express protein-processing and metabolic genes (Fig 4D).
Transcriptomic divergence of mid-staged follicle cells with subsequent convergence
During early oogenesis, access to morphogen signals from polar cells are restricted to the nearby terminal follicle cells (TFCs) on either end of the egg chamber [42]. The posterior terminal follicle cells receive a signal from the oocyte to activate EGFR signaling around stage 6, marking a symmetry breaking event in follicle cells. Cells at the anterior terminal further specify into border, stretched, and centripetal cells and undergo massive morphological changes during stages 9-10B [100] (Fig 5A).
Our dataset shows an unanticipated transcriptomic divergence for post-mitotic follicle cells, which provides a transcriptional basis for follicular symmetry breaking (Fig 1B). To identify the fate assumed by the cells in each resulting branch, we validated the expression of known markers at this stage and also novel markers uncovered from re-clustering 1,666 cells of this stage (Fig 5B). The main body follicle cell (MBFC) branch was identified using mirr and Cad99C expression [21, 49]. And the TFC branch identity was validated by the expression of newly-identified anterior terminal cell marker, Past1 (Fig 5E).
We took the 1,666-cell subset of follicle cells during symmetry breaking and arranged them on a pseudotemporal axis (Fig 5B). Then we performed a GO term enrichment analysis of the differentially expressed genes at the branching point between MBFC and TFC fate. The MBFC fate shows an enrichment of genes in protein export (KEGG : 03060, Padj = 8.55e−20) and protein processing in the endoplasmic reticulum (KEGG : 04141, Padj = 1.13e−17); whereas the TFC fate has an enrichment of genes in endocytosis (KEGG04144, Padj = 1.70e−9), proteasome (KEGG : 03050, Padj = 3.46 7), phagosome (KEGG; 04145, Padj = 6.97e−6), glutathione metabolism (KEGG : 00480, Padj = 2.09e−2), oxidative phosphorylation (KEGG00190, Padj = 2.01e−2), and Hippo pathway (KEGG : 04391Padj = 3.95e−2). The 89 genes that show significant differences between these two branches along pseudotime are highlighted in a heatmap (Fig 5D). Many genes are differentially upregulated in these two branches much later in pseudotime.
We also identified novel genes showing expression that coincides with the symmetry breaking process (Fig 5F). These include FER and wb, which regulate cytoskeletal rearrangement, cell adhesion, and extracellular components. These genes may participate in cell shape changes necessary for border cell migration and/or stretched cell flattening [62, 68]. On the other hand, MBFC-specific expression of stx is interesting as it is involved with the proteasomal degradation regulating Polycomb (Pc) stability [29]. Maintenance of MBFC fate through regulation of chromatin modifiers is an attractive direction that merits further research.
Expression profiles of migrating border and centripetal cells
During stages 9-10B, specialized subsets of TFCs transition from a stationary to migratory state. These include the border cells, which delaminate from the epithelium and move through the nurse cells to reach the oocyte. There, they meet the centripetal cells which migrate inward to cover the anterior end of the oocyte (Fig 6A).
In our plot, we found that the TFC and MBFC branches converge to form a distinct cluster marked by slbo, which is expressed in migrating border and centripetal cells [66] (Fig 5C). To examine the transcriptomic signature of these migratory cells, we first used known stage 8-14 markers [46, 92] to set stage boundaries for the TFC branch (Fig 6B and 6C). This boundary was then used to select gene expression specifically during cell migration. We highlighted 14 representative genes involved in epithelial development (GO : 0060429, Padj = 1.101e−5), the highly enriched GO term in this cluster. These include markers for border cell migration, such as sn, jar, and Inx2 [25, 35, 45, 79]. We also detected in this cluster the expression of Cad99C, which has been reported in several main body follicle cells, and anterior-migrating centripetal cells [21]. These known markers confirm the correct selection of migrating cell types. This cluster also show expression of other stage 9-10B markers, such as vitelline membrane-related genes: psd, Vm26Aa, Vm26Ab, and Vml [30, 92, 106]. With the confidence in our selection of stage 9-10B migrating cells, we identified additional genes such as protein transmembrane transporter Sec61α, actin binding protein capt, cargo receptor eca, and Rho guanyl-nucleotide exchange factor RhoGEF64C, which may contribute to different aspects of the cell migration process [15, 34, 36, 41, 83] (Fig 6D).
Stretched cells share the transcriptional signature with hemocytes as they engulf nurse cells
During the final stages of oogenesis (stages 13-14), after the nurse cells transfer their cytoplasm into the oocyte, the remaining nuclei and cellular contents are removed by the stretched cells. This phagocytic activity of stretched cells is reminiscent of the response of hemocytes upon infection [90]. To determine whether genes expressed in the stretched cell cluster are also expressed in hemocytes, we examined the stage 13-14 specific genes identified from the pseudotemporally arranged 798-cell subset of the TFC branch. We identified 11 genes in this cluster (LRR, PGRP-SD, Irbp18, PGRP-LA, Hsp26, trio, bwa, Hsp67Bc, CecA2, Hsp27, and Hsp23) categorized by their involvement in immune system process (GO:0002376). We also compared genes enriched in the stretched cells with those in the hemocyte cluster and found 79 genes in common. Of these, 30 genes with the highest expression are shown in a heatmap ordered across pseudotime (Fig 6E). Some immune genes have been identified previously in nurse cell engulfment, such as the phagocytic gene drpr, and a scavenger receptor gene crq, confirming sampling of the correct developmental time-point for analysis [64, 90]. The newly identified genes in the stretched cell cluster fall into six general categories of activity: endocytosis/vesicle mediated transport (Syx1A, RabX1, AnxB9, and shrb), antibacterial/immune response (CecA1 and LRR), morphogenesis (Mob2, CG44325, RhoGAP71E, and RhoL), catalytic/metabolic (CG12065, Cip4, and Nmda1), lipid binding (Cip4 and Gdap2), and metal ion transport, especially zinc and magnesium (spict, Swip-1, ZnT63C, and Zip99C). In addition, we validated three new stretched-cell genes (Fig 6F) which are also expressed in hemocytes: a proteolytic enzyme, Cp1 involved in cellular catabolism, an oxidation-reduction enzyme, GILT1, involved in bacterial response, and Sap-r, a lysosomal lipid storage homeostasis gene with known expression in embryonic hemocytes [54, 81, 94]. Together, these findings suggest that stretched cells and hemocytes share transcriptomic signatures required for apoptotic cell clearance, reinforcing their role as “amateur” phagocytes at this stage of development [38].
Gene expression of vitellogenic main body follicle Cells
The clusters for the MBFCs show an enrichment of genes that facilitate vitellogenesis (stages 8-14) and eggshell formation (stages 10-14; Fig 1D). We further analyzed the clusters of the MBFC clusters and found highly variable gene expression patterns (Fig 6B, and 6G). Genes enriched in clusters 10-13, presumably consisting of stage 8-10A MBFCs, include histone binding protein-coding genes such as Nlp, Nph, and P32, which have been shown to cooperate in the post-fertilization regulation of sperm chromatin [32]. Starting in cluster 16, marked by the stage 10B specific marker Fcp3C, chorion-related genes such as CG14187, acid phosphatase CG9449, and signaling receptor, CG7530 show an upregulation. Stage-12 and 14 follicle cells (clusters 18 and 19 respectively) express well-known markers involved with chorion production (e.g. CG4009, CG15570, CG13114, yellow-g, yellow-g2, CG31928, Muc12Ea, Cp16, Cp18, and Cp15) [92] (Fig 6G).
Cellular heterogeneity and markers in the corpus luteum
Ovulation occurs when a mature egg sheds the follicle-cell layer and exits the ovary on its way to be fertilized, following Mmp2 -dependent rupture of posterior follicle cells. The follicle-cell layer, devoid of the egg as a substrate, remains in the ovary and develops into a corpus luteum, similar to ovulation in mammals [24].
As mentioned previously, we validated a number of genes such as Ance, Diap1, Ilp8, and Glut4EF, which all show expression in the corpus luteum cell clusters (Fig 1E). The insulin-like peptide, Ilp8, involved in coordinating developmental timing, is greatly upregulated in stage 14 follicle cells and persists in corpus luteum cells [20]. The caspase binding enzyme, Diap1, is highly expressed in late stage (11-14) anterior follicle cells and persists in anterior corpus luteum cells [58]. The transcription factor, Glut4EF, shows increased expression from stage-10B main body follicle cells and reaches the highest expression level in stage-14 follicle cells and corpus luteum cells [102]. Expression of Ance, a gene producing an extracellular metallopeptidase, is specific to the terminal corpus luteum cells, as well as subsets of oviduct and dorsal appendage forming cells [77].
To explore cellular and transcriptomic heterogeneity of the corpus luteum, we re-clustered the 133-cell subset of corpus luteum cells from original clusters 21, 27 and 28 (Fig 6A). The cells re-clustered into 3 groups, labeled clusters 0, 1 and 2 (Fig 6B). Both Mmp2 and Ance are expressed in clusters 0 and 1, indicating that they are composed of the terminal follicle cells of the corpus luteum, likely at different timepoints (Fig 7B). This also indicates that the anterior and posterior corpus luteum might be transcriptionally similar. Cluster 2 most likely represents the cells derived from main body follicle cells as they express genes such as Ilp8 and Glut4EF that are expressed throughout the corpus luteum (Fig 7B). These results suggest cellular heterogeneity in the corpus luteum with specific functions of cells in different regions.
A transcriptomic switch from oogenesis to ovulation regulation in pre-corpus luteum cells
As stated previously, corpus luteum-enriched genes, Ilp8 and Glut4EF, begin their peak expression in late stage-14 follicle cells. A third, viral-response gene, vir-1, displays a similar pattern of sudden upregulation in stage 14 follicle cells and continued expression in corpus luteum cells after ovulation [28] (Fig 7D). Because of this shared expression timing of non-eggshell-related genes, we considered the stage-14 clusters from the stretched cell and MBFC lineage as a “pre-corpus luteum” and compared genes shared by these cells and those in the corpus luteum to gain insight into potential ovulation-related genes at the end of oogenesis.
GO term enrichment analysis of the genes identified using this method are involved in various biological processes, such as columnar/cuboidal epithelial cell development, growth, maintenance of epithelial integrity, cellular response to stimulus, signal transduction, and JNK cascade. Several key developmental pathways such as MAPK, endocytosis, autophagy, longevity, and Wnt signaling are also enriched (Fig 7C). Two of the genes identified, Nox, an NADPH oxidase and Octβ2R, an octopamine receptor, have been identified as essential for ovulation through calcium regulation in the oviduct [59, 76]. Consistent with our results, many of these ovulation-related genes also sharing expression with the cells of the oviduct and hemocyte clusters, as observed in the feature plots and vir-1 images (Fig 7C’ and 7D).
Discussion
In this study, we used scRNA-seq to survey the expression profiles of cells from the adult Drosophila ovary. Using a previously unreported approach, we recovered high-quality cells through removing contaminants with conflicting marker expression and experimentally validating the identity of clusters using new markers. During dissection, instead of mechanically separating intimately connected tissues (i.e. muscle sheath, hemocytes, oviduct, and fat body) from the ovary, we chose to leave them attached, including them in the dataset. Separating cells from different tissues in this way prevented damage to the ovarian cell types of interest and improved feature selection in downstream analysis. This approach allowed the clustering of all possible cell types that are physically connected to the ovary, thus taking account of cells that otherwise would have appeared as unknown contaminants. This enabled stringent fidelity assessment of cells based on an all-encompassing list of conflicting markers enabling the ultimate recovery of high-quality cells.
With a special focus on the most abundant ovarian cell type, the follicle cells, we identified their entire spatiotemporal trajectory from the stem cell niche to the corpus luteum. Using in silico subset analyses, we identified the transcriptomic basis for early differentiation of polar and stalk cells from the main body follicle cells, mitosis-to-endocycle switch, and follicular symmetry breaking. We also identified transcriptomic signatures of different follicle cell groups that carry out important developmental functions such as migration, engulfment of nurse cells, and eggshell formation. Remarkably, the dataset not only reveals a novel split in the transcriptome during symmetry breaking, but also a convergence of late-stage follicle cells as they form the corpus luteum. During this convergence, we identify ovulation-related genes in late-stage follicle cells (termed pre-corpus luteum) which may signify a novel developmental switch from oogenesis to ovulation regulation.
An unexpected advantage of this approach is the ability to analyze the relationship between ovarian and non-ovarian cell types, which show functional convergence between cells of different tissues. For example, the nurse-cell engulfing stretched cells express genes shared by the hemocytes. While some immune-related genes have been described in these “amateur” phagocytes [38], other morphology-regulating genes shared with hemocytes have not yet been identified. This introduces an interesting possibility that aspects of stretched cell and hemocyte morphology may be essential for the engulfment of cellular material, which necessitates further research. Additionally, cells in the corpus luteum possess a transcriptomic signature that has overlapping genes expressed in the oviduct cells and hemocytes, indicating a potential shared function or interaction between these cell types in regulating ovulation. This is consistent with reports in mammals that the corpus luteum functions as an endocrine body for control of reproductive timing [1, 69], and has signaling cross-talk with macrophages [14, 99]. Overall, our study provides a broad perspective of functional relatedness among cell types regulating oogenesis and ovulation. The convergence of such transcriptional “tool-kits” between developmentally unrelated cell types is an emerging theme identified using this diverse dataset. Curating information on genes that define these overlapping functions will not only help validate our current understanding of gene ontology but also identify unique genes that may have differential functions in specific cell types.
Taken together, our study provides a novel perspective of oogenesis, identifies cell-type and stage markers, and reveals functional convergence in expression between ovarian and non-ovarian cell types. Additionally, it is now possible to use this single-cell dataset to better understand the intercellular and inter-tissue signaling regulating oogenesis and ovulation.
Acknowledgments
Special thanks to Roger Mercer, Yanming Yang, Cynthia Vied, Amber Brown, and Brian Washburn for their assistance in library preparation and sequencing. The authors also acknowledge Yue Julia Wang, Jerome Irianto, Michelle Arbeitman, and Jen Kennedy for help in editing and reviewing the manuscript. The authors also would like to thank Colleen Palmateer for assistance in developing the dissociation protocol and David Corcoran, Brian Oliver, Shamik Bose, Sarayu Row, Ishwaree Datta, Shangyu Gong, Chih-Hsuan Chang, and 10X support for helpful discussion, troubleshooting, inspiration, and assistance. 10X Chromium controller and other essential hardware was provided by the FSU College of Medicine Translational Science Laboratory. Special thanks to Norbert Perrimon Lab, Hugo Bellen Lab, Jin Jiang Lab and the Gene Disruption Project, who have contributed to generating transgenic lines used in our study. W.-M.D. is supported by NIH GM072562, CA224381, CA227789 and NSF IOS-155790.
References
- 1.↵
- 2.↵
- 3.↵
- 4.
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.
- 11.
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.
- 41.↵
- 42.↵
- 43.
- 44.
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.
- 68.↵
- 69.↵
- 70.
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.
- 88.↵
- 89.↵
- 90.↵
- 91.
- 92.↵
- 93.↵
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.↵
- 103.↵
- 104.
- 105.↵
- 106.↵