Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Changes in epithelial proportions and transcriptional state underlie major premenopausal breast cancer risks

View ORCID ProfileLyndsay M Murrow, Robert J Weber, Joseph Caruso, Christopher S McGinnis, Kiet Phong, Philippe Gascard, Alexander D Borowsky, Tejal A Desai, Matthew Thomson, Thea Tlsty, Zev J Gartner
doi: https://doi.org/10.1101/430611
Lyndsay M Murrow
1Department of Pharmaceutical Chemistry and Center for Cellular Construction, University of California San Francisco, San Francisco CA 94158
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Lyndsay M Murrow
Robert J Weber
1Department of Pharmaceutical Chemistry and Center for Cellular Construction, University of California San Francisco, San Francisco CA 94158
2Medical Scientist Training Program (MSTP), University of California, San Francisco, San Francisco, California, USA
3Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco CA 94158
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Joseph Caruso
4Department of Pathology and Helen Diller Cancer Center, University of California San Francisco, San Francisco CA 94143
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Christopher S McGinnis
1Department of Pharmaceutical Chemistry and Center for Cellular Construction, University of California San Francisco, San Francisco CA 94158
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kiet Phong
1Department of Pharmaceutical Chemistry and Center for Cellular Construction, University of California San Francisco, San Francisco CA 94158
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Philippe Gascard
4Department of Pathology and Helen Diller Cancer Center, University of California San Francisco, San Francisco CA 94143
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alexander D Borowsky
5Center for Immunology and Infectious Diseases, Department of Pathology and Lab, University of California Davis, Davis CA 95696
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tejal A Desai
3Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco CA 94158
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Matthew Thomson
6Computational Biology, Caltech, Pasadena CA 91125
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Thea Tlsty
4Department of Pathology and Helen Diller Cancer Center, University of California San Francisco, San Francisco CA 94143
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zev J Gartner
1Department of Pharmaceutical Chemistry and Center for Cellular Construction, University of California San Francisco, San Francisco CA 94158
7Chan Zuckerberg Biohub, University of California San Francisco, San Francisco CA 94158
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: zev.gartner@ucsf.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

The human breast undergoes lifelong remodeling in response to estrogen and progesterone, but hormone exposure also increases breast cancer risk. Here, we use single-cell analysis to identify distinct mechanisms through which breast composition and cell state affect hormone signaling. We show that prior pregnancy reduces the transcriptional response of hormone-responsive (HR+) epithelial cells, whereas high body mass index (BMI) reduces overall HR+ cell proportions. These distinct changes both impact neighboring cells by effectively reducing the magnitude of paracrine signals originating from HR+ cells. Because pregnancy and high BMI are known to protect against hormone-dependent breast cancer in premenopausal women, our findings directly link breast cancer risk with person-to-person heterogeneity in hormone responsiveness. More broadly, our findings illustrate how cell proportions and cell state can collectively impact cell communities through the action of cell-to-cell signaling networks.

Introduction

The rise and fall of estrogen and progesterone with each menstrual cycle and during pregnancy controls cell growth, survival, and tissue morphology in the human breast. The impact of these changes is profound, and lifetime exposure to cycling hormones is a major modifier of breast cancer risk (1). In addition to the dynamics observed within individuals in response to changing hormone levels, there is also a high degree of heterogeneity between individuals in epithelial architecture (2), cell composition (3), and hormone responsiveness (4–6), and these differences likely impact breast cancer susceptibility. However, because the breast is both highly variable between women and undergoes dynamic changes over time, it has been difficult to link differences in breast cancer risk with specific biological mechanisms in the breast.

One approach has been to identify specific cellular and molecular changes associated with established breast cancer risk factors identified by epidemiological studies. Reproductive history and body mass index (BMI) are two factors that strongly influence breast cancer risk. Pregnancy has two opposing effects: it increases short-term risk by up to 25% (7) but decreases lifetime risk by up to 50%, particularly for women with a first pregnancy early in life (8). Obesity has opposing effects on risk before versus after menopause: it increases postmenopausal risk by around 30% (9) but decreases premenopausal risk by up to 45% (10, 11). The protective effects of both BMI and pregnancy are strongest for estrogen- and progesterone-receptor positive (ER+/PR+) breast cancers (11, 12), suggesting that altered hormone signaling is one mechanism contributing to the tumor-protective effect of these two factors. The mechanistic link between pregnancy and the long-term reduction in breast cancer risk remains an open question, but it has been speculated that the effects of pregnancy-induced alveolar differentiation—such as changes in the epithelial architecture of the mammary gland or a general decrease in the hormone responsiveness of the epithelium—may contribute to reduced risk (2, 8). While estrogen production by adipose tissue is a major mechanism proposed to contribute to the increased risk of postmenopausal breast cancer in obese women (13), far less is known about the underlying mechanisms that link obesity and the decreased risk of ER/PR+ premenopausal breast cancer.

One challenge for understanding the relationship between hormone signaling, pregnancy, and BMI in the healthy human breast is that many of the effects of ovarian hormones within the breast are indirect. The estrogen and progesterone receptors (ER/PR) are expressed in only 10-15% of hormone-responsive (HR+) luminal cells within the epithelium (14), and most of the effects of hormone receptor activation are mediated by a complex cascade of paracrine signaling from HR+ luminal cells to other cell types in the breast. Thus, decreased hormone responsiveness in the parous breast could reflect either: 1) a change in the hormone signaling response of HR+ luminal cells—due to either changes in HR+ luminal cells themselves or non-cell autonomous changes in hormone levels or availability—and/or 2) a reduction in the proportion of HR+ luminal cells, leading to dampened paracrine signaling to other cell types downstream of ER/PR activation. Single-cell RNA sequencing (scRNAseq) is particularly well-suited to investigate this problem, since it enables unbiased classification of the full repertoire of cell types within the human breast together with their transcriptional state.

Here, we use scRNAseq of twenty-eight premenopausal reduction mammoplasty tissue specimens, together with FACS and immunostaining in an expanded cohort (Table S1), to directly measure sample-to-sample variability in cell proportions and cell signaling state in the breast. We develop a computational approach that leverages the inter-sample transcriptional heterogeneity in our dataset to identify coordinated changes in transcriptional states across cell types in the breast. Based on this, we identify a set of correlated gene expression programs in HR+ luminal cells and other cell types representing the paracrine signaling network activated in response to hormones. Second, we find that prior history of pregnancy is associated with striking changes in epithelial composition, and we propose that these changes are consistent with the protective effect of pregnancy on lifetime breast cancer risk. Finally, we show that pregnancy and obesity both lead to decreased hormone response in the breast through two distinct mechanisms: pregnancy directly affects hormone signaling in HR+ luminal cells whereas obesity reduces the proportion of HR+ luminal cells. Overall, these results provide a comprehensive map of the cycling human breast and identify cellular changes that underlie breast cancer risk factors.

Results

Inter-sample variability in epithelial cell proportions and transcriptional cell state in the human breast

To identify inter-individual differences in cell composition and cell state in the human breast, we performed scRNAseq analysis on 86,136 cells from reduction mammoplasties in 28 premenopausal donors (Fig. 1A and table S1). To obtain an unbiased snapshot of the epithelium and stroma, we collected live/singlet cells for all samples. For a subset of samples, we also collected epithelial cells or purified luminal and basal/myoepithelial cells (fig. S1A, table S2). We used MULTI-seq barcoding and in silico genotyping for sample multiplexing to minimize technical variability between samples (fig. S1B, methods) (15, 16).

Fig. S1.
  • Download figure
  • Open in new tab
Fig. S1. Sorting strategy and MULTI-seq barcoding for scRNAseq experiments.

(A) FACS plots depicting sort gates used for sequencing. (B) TSNE dimensionality reduction of the normalized barcode count matrices and final sample classification for MULTI-seq experiments (Batches 3 and 4). (C) UMAP dimensionality reduction of the combined data from twenty-eight samples for each sort population.

Fig. 1.
  • Download figure
  • Open in new tab
Fig. 1. Sample-to-sample variability in epithelial cell proportions and transcriptional cell state in the human breast.

(A) scRNAseq workflow: Reduction mammoplasty samples were processed to a single cell suspension, followed by MULTI-seq sample barcoding, FACS purification, and library preparation. (B) UMAP dimensionality reduction and unsupervised clustering of the combined data from twenty-eight samples identifies the major epithelial and stromal cell types in the breast. (C) Stacked bar plot of the proportion of epithelial cells (HR+ luminal; secretory luminal; basal/myoepithelial) across breast tissue samples. (D) Density plots highlighting the transcriptional cell state of HR+ luminal cells from individuals with at least 100 cells in this cluster.

Sorted basal and luminal cell populations were well-resolved by uniform manifold approximation and projection (UMAP) (fig. S1C). Unsupervised clustering identified one myoepithelial/basal cluster, two luminal clusters, and six stromal clusters (Fig. 1B). Based on the expression of known markers, the two luminal clusters were annotated as hormone-responsive (HR+) and secretory luminal cells, and the six stromal clusters were annotated as fibroblasts, blood endothelial cells, lymphatic cells, vascular accessory cells, lymphocytes, and macrophages (Fig. 1B and fig S2A-B). The luminal populations described here closely match those identified as “hormone-responsive/L2” and “secretory/L1” in a previous scRNAseq analysis of the human breast (17), as well as microarray data for sorted EpCAM+/CD49f− “mature luminal” and EpCAM+/CD49f+ “luminal progenitor” populations (18). Here, we use the nomenclature hormone-responsive/HR+ and secretory to refer to these two cell types. The HR+ cluster was enriched for the hormone receptors ESR1 and PGR (fig. S2C), and other known markers such as ANKRD30A (fig. S2A-B) (17). Consistent with previous studies demonstrating variable hormone receptor expression across the menstrual cycle (19), expression of ESR1 and PGR transcripts were sporadic and often non-overlapping. Within the HR+ luminal cluster, 22% of the cells had detectable levels of ESR1 or PGR, with only 2% of cells expressing both transcripts (fig. S2D).

Fig. S2.
  • Download figure
  • Open in new tab
Fig. S2. Marker analysis of cell type clusters for scRNAseq experiments.

(A) Heatmap highlighting marker genes used to identify each cell type. For visualization purposes, we randomly selected 100 cells from each cluster. (B) UMAPs depicting expression of selected markers in log counts. (C) Dot plot depicting the log normalized mean and frequency of ESR1 and PGR expression across cell type clusters. (D) Venn diagram highlighting the frequency of ESR1 and PGR expression and percent overlap in the HR+ luminal cell cluster.

Beyond identifying the major cell types, single-cell analysis additionally resolved two sources of intersample variability in the human breast. First, while cells from different individuals were represented across all clusters (cluster entropy = 0.93, methods) (fig. S3A), the proportions of epithelial cell types were highly variable between samples (Fig. 1C). Across individuals, epithelial cell proportions in the live/singlet and epithelial sort gates ranged from 2-80% for basal/myoepithelial cells, from 7-89% for HR+ luminal cells, and from 9-70% for secretory luminal cells (fig. S3B). Second, independent of variation in cell proportions, individuals displayed distinct transcriptional signatures within cell types (Fig. 1D and fig S3C). This variation in cell state was not due to technical variability across batches (table S2), as cells from the same sample were more similar to each other than cells from different samples, regardless of the day of processing (fig. S3, D and E, methods).

Fig. S3.
  • Download figure
  • Open in new tab
Fig. S3. Two sources of inter-sample variability in the breast.

(A) UMAP for each sample highlighting cell types identified by unsupervised clustering. Cells from different individuals are represented across all clusters (cluster entropy = 0.93). (B) Quantification of the proportion of epithelial cells (basal/myoepithelial; HR+ luminal; secretory luminal) in each sample, with the cross-sample median and range for each cell type (n = 28 samples). (C) Density plots highlighting the transcriptional cell state of basal/myoepithelial cells, secretory luminal cells, or fibroblasts from individuals with at least 100 cells in each cluster. (D) UMAP of samples that were run across multiple batches, highlighting cells from each batch. (E) Quantification of the “mixing metric”—or similarity—between cells from the same or different sample and batch for the indicated cell types. See table S2 for sample and batch information.

Parity is associated with an increased proportion of basal/myoepithelial cells in the epithelium

The breast undergoes a major expansion of the mammary epithelium during pregnancy, followed by a regression back towards the pre-pregnant state after weaning in a process called involution. However, the epithelial architecture remains distinct from that of women without prior pregnancy, consisting of larger terminal ductal lobular units (TDLUs) containing greater numbers of acini. At the same time, individual acini are reduced in size (2). We hypothesized that these architectural changes would be a major driver of differences in epithelial cell proportions between samples in our dataset.

We focused our initial analysis on the 63,583 cells in the live/singlet and epithelial sort gates to get an unbiased view of how the epithelial composition of the breast changes with pregnancy. The proportion of basal/myoepithelial cells in the epithelium was approximately two-fold higher in women with prior history of pregnancy (parous) relative to women without prior pregnancy (nulliparous) (Fig. 2A and fig. S4A). We confirmed these results in an expanded cohort of samples using three additional methods. First, we measured basal cell proportions by flow cytometry analysis of EpCAM and CD49f. Consistent with clustering results, parity was associated with an increase in the average proportion of basal cells from 12% to 39% of the epithelium (Fig. 2B). The proportion of basal cells did not vary with other discriminating factors such as BMI, race, or hormonal contraceptive use (HC), but was weakly associated with age (R2 = 0.20, p < 0.04) (fig. S4B). To determine the relative effect of each factor, we performed multiple linear regression analysis and found that the basal cell fraction positively correlated with pregnancy history (p < 2e-05), but not age (p = 0.17) (Table S3). Next, as FACS processing steps may affect tissue composition, we performed two further analyses. We reanalyzed previously published microarray datasets of total RNA isolated from core needle biopsies from premenopausal (n = 71 parous/ 42 nulliparous) or postmenopausal (n = 79 parous/ 30 nulliparous) women (20, 21), and confirmed a significant increase in the basal/myoepithelial markers KRT5, KRT14, and TP63 relative to luminal markers in parous samples (fig. S4C). Finally, we performed immunostaining and confirmed an approximately 2-fold increase in the ratio of p63+ basal cells to KRT7+ luminal cells in intact tissue sections (Fig. 2C). Notably, staining demonstrated that this change in epithelial proportions was specific to TDLUs rather than ducts (fig. S5A). We hypothesized that the increased frequency of basal/myoepithelial cells observed in parous women could be explained, in part, by changes in TDLU architecture. To test this, we performed a morphometric comparison of TDLUs between parous and nulliparous samples in our dataset. Consistent with previous reports (2), we observed a marked decrease in the average diameter of individual acini in parous women (fig. S5B). Additionally, we found that the average thickness of the luminal cell layer was linearly associated with acinus diameter (fig. S5C) and reduced in parous women (fig. S5D).

Fig. S4.
  • Download figure
  • Open in new tab
Fig. S4. Prior pregnancy is associated with changes in epithelial cell proportions.

(A) Quantification of the proportion of basal/myoepithelial cells (Basal), HR+ luminal cells (HR+), and secretory luminal cells (Secretory) in the mammary epithelium of nulliparous (NP) versus parous (P) samples, as identified by scRNAseq clustering (n = 28 samples; Wald test). (B) Quantification of the percentage of EpCAM−CD49f+ basal cells identified by FACS analysis versus age (n = 23; R2 = 0.20; p < 0.04, Wald test), body mass index (n = 21; R2 = 0.03; p = 0.44, Wald test), race (n = 23; p = 0.55, Mann-Whitney test), or hormonal contraceptive use (n = 23; p = 0.50, Kruskal-Wallis test). (C) Microarray differential expression analysis for selected genes from Santucci-Periera et al. and Peri et al. (20, 21).

Fig. S5.
  • Download figure
  • Open in new tab
Fig. S5. Prior pregnancy is associated with changes in epithelial architecture.

(A) Immunostaining for the basal/myoepithelial marker p63 and pan-luminal marker KRT7, and quantification of the ratio of p63+ myoepithelial cells to KRT7+ luminal cells in the ducts and terminal ductal lobular units (TDLUs) for parous (P) versus nulliparous (NP) samples (n = 14 samples; Mann-Whitney test). Scale bars 50 μm. (B) Quantification of the average acinar diameter in TDLUs from nulliparous (NP) versus parous (P) samples (n = 14 samples; p < 0.002, Mann-Whitney test). (C) Linear regression analysis of the width of the luminal layer versus acinus diameter for individual acini with circularity greater than 0.75 (n = 56 acini from 15 samples; R2 = 0.89, p < 0.0001, Wald test). (D) Quantification of the average thickness of the luminal layer in TDLUs of nulliparous (N) versus parous (P) samples (n = 14 samples; p < 0.002, Mann-Whitney test). (E) Quantification of the average luminal cell density (nuclei per μm2 of luminal area) in TDLUs from nulliparous (NP) versus parous (P) samples (p = 0.43, Mann-Whitney test). (F) Left: Linear regression analysis of the perimeter of the luminal layer versus the number of p63+ basal cells for individual acini (n = 72 acini from 13 samples; R2 = 0.55, p < 0.0001, Wald test). Right: Linear regression analysis of the area of the luminal layer versus the number of KRT7+ luminal cells for individual acini (n = 72 acini from 13 samples; R2 = 0.81, p < 0.0001, Wald test).

Fig. 2.
  • Download figure
  • Open in new tab
Fig. 2. Prior history of pregnancy is associated with an increased proportion of basal cells in the mammary epithelium.

(A) UMAP plot of sorted live singlet and epithelial cells from nulliparous and parous samples, with the percent of luminal and basal/myoepithelial cells highlighted. (B) Representative FACS analysis of the percentage of EpCAM−/CD49f+ basal cells within the Lin− epithelial population, and quantification of the percentage of basal cells in parous (P) versus nulliparous (NP) women (n = 18 samples; p < 0.0001, Mann-Whitney test). (C) Immunostaining for the basal/myoepithelial marker p63 and pan-luminal marker KRT7, and quantification of the ratio of p63+ basal cells to KRT7+ luminal cells for samples with or without prior history of pregnancy (NP = nulliparous, P = parous; n = 13 samples; p < 0.001, Mann-Whitney test). Scale bars 50 μm. (D) Two-dimensional geometric model of the relative space available for basal cells (outer perimeter of the luminal layer, P) and luminal cells (area of the luminal layer, A) within individual acini. Acini were modeled as hollow circles with a shell thickness proportional to their diameter. (E) Quantification of the average basal cell coverage (nuclei per μm of luminal perimeter) in terminal ductal lobular units (TDLUs) from nulliparous (NP) versus parous (P) samples (p = 0.66, Mann-Whitney test). (F) Results of geometric modeling depicting the relative area and perimeter of the luminal layer as a function of acinus diameter. Dots represent measurements of individual acini from TDLUs in parous (n=53 acini from 7 samples) or nulliparous (n=29 acini from 7 samples) women as indicated (mean absolute percentage error = 9.5%).

To determine how these parameters influence the relative proportions of each cell type, we implemented a simple geometric model. Based on our measurements, we modeled each acinus in two dimensions as a hollow circle with a shell thickness linearly proportional to its diameter (Fig. 2D). Since basal cells form a monolayer along the luminal surface, we represented the space available for basal cells as the outer perimeter of the luminal layer, and the space available for luminal cells as the area of the luminal layer. Surprisingly, when normalized to cross-sectional area (for luminal cells) or perimeter (for basal cells), there was no change in luminal cell density or basal cell coverage between parous versus nulliparous samples (Fig. 2E and fig. S5E). Across all samples, the number of basal or luminal cells per acinus was directly proportional to the space available for each cell type (fig. S5F). However, geometric modeling accurately predicted the relationship between the luminal area and outer perimeter for individual acini (mean absolute percentage error loss = 9.5%) and demonstrated that as individual acini increased in size, the space available for luminal cells increased at a faster rate than the space available for basal cells (Fig. 2F). Thus, geometric constraints underlie at least part of the observed differences in epithelial cell proportions between parous and nulliparous samples.

Obesity is associated with a reduction in the proportion of HR+ luminal cells

While parity was associated with a decreased overall proportion of luminal cells in the epithelium, the proportions of individual HR+ and secretory subtypes within the luminal compartment were highly variable. Consistent with previous work (5, 22), we observed reduced frequencies of HR+ luminal cells in parous women (p < 0.03). However, the proportion of secretory luminal cells was not associated with parity (fig. S4A). Together, these data suggested that additional factors influence the relative proportion of HR+ versus secretory cells within the luminal compartment. We therefore performed multiple comparison analysis to test for the effects of parity, BMI, race, age, and hormonal contraceptive use on the proportions of HR+ versus secretory cells in the luminal compartment. We found that the relative proportion of HR+ luminal cells was reduced in obese women (BMI > 30) (Fig. 3A) but did not vary with other discriminating factors such as age, reproductive history, hormonal contraceptive use, or race (fig. S6A). On a continuous scale, each 12 units of BMI was associated with a 2-fold reduction in the proportion of HR+ cells in the luminal compartment (fig. S6B). We observed similar results using clustering analysis from the 10,795 cells in the luminal sort gate (fig. S6C).

Fig. S6.
  • Download figure
  • Open in new tab
Fig. S6. The proportion of HR+ luminal cells is reduced in obese women and does not vary with other discriminating factors.

(A) Proportion of HR+ luminal cells in each sample (dots) stratified by age, reproductive history, hormonal contraceptive use, or race (p > 0.05, Wald test). (B) Quasi-Poisson regression model of the proportion of HR+ cells in the luminal compartment as a function of BMI (FDR < 0.001, Wald test). (C) UMAP plot of sorted luminal cells from non-obese (BMI < 30) and obese (BMI > 30) samples, highlighting hormone-responsive (HR+) and secretory luminal cells.

Fig. 3.
  • Download figure
  • Open in new tab
Fig. 3. Obesity is associated with a decrease in the proportion of hormone-responsive cells in the luminal compartment.

(A) Left: UMAP plot of sorted live singlet and epithelial cells from non-obese (BMI < 30) and obese (BMI ≥ 30) samples, highlighting hormone-responsive (HR+) and secretory luminal cells. Right: Quantification of the proportion of HR+ or secretory cells in the luminal compartment of obese versus non-obese samples (n = 16 samples; FDR < 0.0002, Wald test). (B) A quasi-Poisson regression model accurately predicts the proportion of HR+ cells in the luminal compartment as a function of BMI in an independent cohort of Komen Tissue Bank core biopsy samples (predictive R2 = 0.62, mean absolute percentage error = 14.8%; see also fig. S6B and methods). (C) Left: UMAP depicting expression of KRT23 in log counts. Right: Dot plot depicting the log normalized mean and frequency of KRT23, ESR1, and PGR expression across luminal cell types. (D) Co-immunostaining of PR, KRT23, and the panluminal marker KRT7 and quantification of the percentage of PR+ cells within the KRT7+/KRT23- and KRT7+/KRT23+ luminal cell populations (n = 16 samples; p < 0.001 Mann-Whitney test). (E) Coimmunostaining of KRT23 and KRT7 and linear regression analysis of the percentage of KRT23+ luminal cells versus BMI (n = 10 samples; R2 =0.71, p < 0.003, Wald test). Scale bars 50 μm. (F) Summary of changes in epithelial cell proportions with pregnancy and obesity: parity is associated with an increase in the proportion of basal cells and corresponding decrease in the proportion of luminal cells, whereas obesity is associated with a decrease in the proportion of HR+ cells in the luminal compartment.

One limitation of the reduction mammoplasty dataset was that all samples classified as non-obese were from nulliparous women less than 24 years old, whereas obese samples were more likely to be from parous and older age women (table S1). Therefore, we performed scRNAseq analysis on an independent set of breast core biopsies from healthy premenopausal women who donated tissue to the Komen Tissue Bank (KTB). In contrast with the reduction mammoplasty cohort, the KTB cohort consisted of older (37-47 years) parous samples with BMI in the normal or overweight range (BMI 20.7-28.3) (table S1, fig. S7A). We used MULTI-seq for sample multiplexing and collected pooled live/singlet and epithelial cells (fig. S7B). Unsupervised clustering identified one myoepithelial/basal cell cluster, two luminal cell clusters, and five stromal clusters (fig. S7C). As in our previous analysis, the two luminal clusters were identified as HR+ and secretory luminal cells based on the expression of known markers (fig. S7, D and E). Using the reduction mammoplasty cohort as a training set, we accurately predicted the proportion of HR+ luminal cells in the KTB cohort with a mean absolute percentage error of 14.8% (Fig. 3B).

Fig. S7.
  • Download figure
  • Open in new tab
Fig. S7. Summary of scRNAseq analysis of samples from the Komen Tissue Bank.

(A) Scatter plots highlighting differences in body mass index (BMI), reproductive history, and average age between the Komen Tissue Bank (KTB) and reduction mammoplasty cohorts (see also table S1). Trendline depicts the positive association of BMI with Age in the reduction mammoplasty cohort. (B) TSNE dimensionality reduction of the normalized barcode count matrices and final sample classification for MULTI-seq barcoding. (C) UMAP dimensionality reduction and unsupervised clustering of the combined data from seven KTB samples identifies the major epithelial and stromal cell types in the breast. (D) UMAPs depicting expression of selected markers in log counts. (E) Heatmap highlighting marker genes used to identify each cell type. For visualization purposes, we randomly selected 50 cells from each cluster.

To verify these results in tissue sections, we performed immunostaining for ER and PR. There was a trend toward decreased expression of PR with increasing BMI, but the change was not statistically significant (p = 0.11, fig. S8A). Notably, ER and PR expression was variable and partly non-overlapping, ranging from 11-71% overlap (fig. S8B). As we had previously also observed heterogeneous expression of ESR1 and PGR transcripts within the HR+ luminal cell cluster (fig. S2, C and D), we hypothesized that the variability in staining was due to changes in ER and PR expression, stability, and nuclear localization that have all been previously observed based on hormone receptor activation status (19, 23, 24).

Fig. S8.
  • Download figure
  • Open in new tab
Fig. S8. Hormone receptor expression is highly variable.

(A) Top: Co-immunostaining of PR and KRT7 and linear regression analysis of the percentage of PR+ luminal cells versus BMI (n = 10 samples; R2 =0.29, p = 0.11, Wald test). Bottom: Co-immunostaining of ER and KRT7 and linear regression analysis of the percentage of ER+ luminal cells versus BMI (n = 8 samples; R2 =0.06, p = 0.56, Wald test). Scale bars 50 μm. (B) Venn diagram highlighting the average percent overlap between ER and PR as measured by immunostaining (n = 5 samples, range = 11-71%). (C) Multiplexed in situ hybridization of estrogen receptor transcript (ESR1) and immunostaining for estrogen receptor protein (ER) and KRT7. Right: Plots depicting the expression of ESR1 and ER across multiple tissue sections (R2 = 0.6, p < 0.01, Wald test) or within individual cells (p = 0.63, Wilcoxon matched pairs signed-rank test). Scale bars 25 μm. (D) Table and bar plot depicting the sensitivity and specificity for ESR1 or PGR transcript expression in the HR+ luminal cell versus secretory luminal cell cluster.

Based on this, we predicted that ER/PR transcript and protein expression levels would co-vary across samples due to the overall proportion of HR+ luminal cells and the hormonal microenvironment, but would be stochastically expressed in individual cells at any one time due to dynamic fluctuations in mRNA and protein expression and stability. To test this, we performed co-immunostaining and RNA-FISH and confirmed that although ER transcript and protein levels correlate across tissue sections, they do not correlate on a per-cell basis—on average, only 31% of cells expressing ESR1 transcript also expressed ER protein (fig. S8C). Importantly, our scRNA-seq analysis demonstrated that the expression of ESR1 or PGR transcript was highly specific for cells in the HR+ luminal cluster, although the overall proportion of HR+ cells that expressed each transcript was low and varied across individuals (fig. S8D). Thus, these data demonstrate that immunostaining for nuclear hormone receptors underestimates the fraction of cells in the HR+ lineage and that lack of ER/PR expression cannot be used to reliably define a cell as part of the secretory versus HR+ luminal cell lineage.

On the basis of these results, we sought to identify another marker to distinguish between luminal subpopulations, and identified keratin 23 (KRT23) as highly enriched in the secretory luminal cell cluster (Fig. 3C), as was also reported by a previous scRNAseq study (17). Immunohistochemistry for KRT23 and PR or ER confirmed that these proteins are expressed in mutually exclusive luminal populations (Fig. 3D, and fig. S9, A and B). The proportion of KRT23+ luminal cells in each sample was also highly correlated with the proportion of secretory luminal cells identified by scRNAseq (fig. S9C). KRT23 thus represents a discriminatory marker between the two luminal populations. Staining in intact tissue sections confirmed that the proportion of KRT23+ secretory luminal cells increased by about 17% for every 10-unit increase in BMI (Fig. 3E). Together, these data demonstrate that there are two independent effects of reproductive history and body weight on cell proportions in the mammary epithelium: parity affects the ratio of basal to luminal cells whereas BMI affects the ratio of HR+ versus secretory luminal cells (Fig. 3F).

Fig. S9.
  • Download figure
  • Open in new tab
Fig. S9. Keratin 23 is a specific marker of cells in the secretory luminal cell lineage.

(A) Representative images of co-immunostaining of PR, KRT23, and the pan-luminal marker KRT7. (B) Coimmunostaining of ER, KRT23, and the pan-luminal marker KRT7 and quantification of the percentage of ER+ cells within the KRT7+/KRT23- and KRT7+/KRT23+ luminal cell populations (n = 5 samples; p < 0.01 Mann-Whitney test). Scale bars = 50 μm. (C) Linear regression analysis of the percentage of luminal cells in the secretory lineage identified by scRNAseq clustering versus the percentage of KRT23+ luminal cells identified by immunostaining (n = 15 samples; R2 =0.71, p < 0.0001, Wald test).

Hormone signaling is a primary axis of transcriptional variability in HR+ luminal cells

Beyond differences in cell proportions, we found that transcriptional cell state within clusters was a second source of inter-sample variability in our dataset (Fig. 1D, and fig. S3, C, D, and E). Since estrogen and progesterone are master regulators of breast development, we hypothesized that hormone signaling would represent a major source of transcriptional heterogeneity. Consistent with this, we previously observed a high degree of sample-to-sample variation in ER/PR expression (fig. S8D) within the HR+ luminal cell cluster, which has been shown to vary based on hormone receptor activation state (19, 23, 24).

To quantify cell state in HR+ luminal cells, we performed principal component (PC) analysis on this population. Analysis of ranked gene loadings demonstrated that variation across PC1 in HR+ cells was driven by genes involved in the response to hormone receptor activation, including the essential PR target genes TNFSF11 (RANKL) and WNT4 (6, 25) (Fig. 4A). Of the 20 genes with the highest loadings in PC1, 12 have been previously described as associated with either progesterone signaling, estrogen signaling, or the luteal phase of the menstrual cycle when progesterone is at its peak (fig. S10A, table S4) (6, 26–35). Thus, transcriptional changes associated with hormone signaling state (PC1) are a dominant source of variation in HR+ luminal cells (fig. S10B).

Fig. S10.
  • Download figure
  • Open in new tab
Fig. S10. Matrix decomposition analysis of HR+ luminal cells.

(A) Heatmap highlighting the 20 genes with the highest loadings in PC1, annotated by their association with estrogen signaling, progesterone signaling, or the luteal phase of the menstrual cycle. HR+ luminal cells are ordered by their cell loadings in PC1. (B) Barchart depicting the proportion of variance explained by each of the top 20 principal components. (C) Parameter selection for non-negative matrix factorization based on KL divergence plots (methods). (D) Heatmap of cell loadings across each metagene for HR+ luminal cells. (E) PCA plot of HR+ luminal cells depicting expression of HR+ metagene 8. (F) Gene set enrichment analysis of HR+ cell metagene 8, showing the top pathways identified from the Molecular Signatures Database Hallmark and GO gene sets.

Fig. 4.
  • Download figure
  • Open in new tab
Fig. 4. Hormone signaling is a primary axis of transcriptional variability in HR+ luminal cells.

(A) PCA plot of HR+ luminal cells depicting expression of WNT4 and TNFSF11 (RANKL) in log counts. (B) Non-negative matrix factorization identifies a specific gene signature of hormone signaling in HR+ luminal cells. Heatmap depicting the top 20 genes expressed in each HR+ cell metagene, highlighting marker genes in HR+ metagene 8. (C) Gene set enrichment analysis of HR+ cell metagene 8, showing enrichment of genes shown to be upregulated during the luteal phase of the menstrual cycle (29) (NES = 2.16, p < 1e-9). (D) Ridge plots depicting the distribution of HR+ metagene 8 (hormone signaling) expression across samples, and quantification of the average expression of metagene 8 in nulliparous (NP) versus parous (P) samples (n = 22 samples, p = 0.04, Mann-Whitney test). (E) Immunostaining for p63, TCF7, and KRT7, and quantification of the percentage of TCF7+ cells within the p63+ basal cell compartment for nulliparous (NP) versus parous (P) samples (n=15 samples; p < 0.002, Mann-Whitney test).

As PC analysis seeks to maximize the variance of a projected dataset, it may combine gene signatures from multiple transcriptional states into a single component (36). Therefore, we performed non-negative matrix factorization (NMF) to identify a specific gene signature of hormone signaling, and identified 9 distinct gene expression programs, or “metagenes” in HR+ luminal cells (Fig. 4B, and fig. S10, C and D) {Welch:2019dz, Yang:2016fu}. Cell embedding in PC1 was highly correlated with expression of metagene 8 (Pearson correlation = 0.79, fig. S10E). Analysis of ranked gene loadings demonstrated that this “hormone signaling” metagene comprised a similar gene expression program as PC1, including the PR targets TNFSF11 and WNT4 and the ER target TFF3 (Fig. 4B). The hormone signaling metagene was enriched for genes upregulated during the luteal phase of the menstrual cycle (Fig. 4C) (29), and for transcripts in the Molecular Signatures Database Hallmark “early estrogen response” and “late estrogen response” gene sets (fig. S10F) (39). Thus, NMF identified a distinct transcriptional signature for hormone receptor activation in HR+ luminal cells.

The hormone signaling response of HR+ luminal cells is reduced in parous women

Previous epidemiologic analyses have demonstrated that the protective effect of parity against breast cancer is specific for ER+/PR+ tumors (40). Decreased hormone responsiveness following pregnancy is one proposed mechanism for this effect (8). Supporting this, previous studies demonstrated decreased expression of the PR effector WNT4 following pregnancy (5, 22, 41). Moreover, in an explant culture model, estrogen induced expression of the ER target gene AREG only in nulliparous women (4). As the effects of hormones in the breast are primarily mediated by paracrine signaling from HR+ luminal cells, this decreased hormone responsiveness could be caused by either: 1) a change in the magnitude of paracrine signals produced by each HR+ luminal cell, and/or 2) a reduction in the overall proportion of HR+ luminal cells leading to a “dilution” of paracrine signals following ER/PR activation. It has been difficult to distinguish between these mechanisms using tissue-level analyses. By probing the single-cell transcriptional landscape of the HR+ luminal cell population, NMF analysis provides a means to directly interrogate whether parity influences the per-cell hormone signaling response of HR+ luminal cells.

To quantify variation in hormone signaling, we first measured the similarity between each sample’s singlecell distribution across metagene 8. Hierarchical clustering identified two sets of samples, representing high or low hormone signaling (fig. S11A). Based on this, we found that while the level of hormone signaling in HR+ luminal cells varied between nulliparous women, likely reflecting changing hormone levels across the menstrual cycle, per-cell hormone signaling in HR+ luminal cells was significantly reduced in parous women (Fig. 4D, and fig. S11B). To identify differentially expressed genes between nulliparous and parous women with high sensitivity, we generated a pseudo-bulk dataset of aggregated HR+ luminal cells from each sample (Methods) and confirmed that parous women had decreased expression of the canonical hormone-responsive genes TFF1, PGR, WNT4, TNFSF11, and AREG (fig. S11C, table S5). Notably, the progesterone receptor itself is an ER target gene (42). Staining for the progesterone receptor and K23 confirmed that PR expression was reduced in the HR+ luminal cell subpopulation (K7+/K23-) of parous samples (fig. S11D).

Fig. S11.
  • Download figure
  • Open in new tab
Fig. S11. Parity is associated with a decrease in the per-cell hormone signaling response of HR+ luminal cells.

(A) Heatmap showing the similarity between each sample’s single-cell expression distribution across HR+ cell metagene 8, measured as (1 - Jensen-Shannon distance). Hierarchical clustering identifies two sets of samples representing high or low expression of the “hormone signaling” metagene (ward D2). (B) PCA plot of HR+ luminal cells in nulliparous or parous women depicting expression of HR+ cell metagene 8. (C) Volcano plot highlighting the differential expression of canonical hormone-responsive genes between parous and nulliparous samples in HR+ luminal cells. (D) Immunostaining for PR, KRT23, and KRT7, and quantification of the percentage of PR+ cells within the KRT23-/KRT7+ luminal cell compartment for nulliparous (NP) versus parous (P) samples (n=15 samples; p < 0.03, Mann-Whitney test). (E) Immunostaining for p63, TCF7, and KRT7 in ducts versus TDLUs, and quantification of the percentage of TCF7+ cells within the p63+ basal cell compartment (n = 14 samples; p = 0.64, Mann-Whitney test).

Finally, we confirmed that paracrine signaling downstream of PR activation was specifically reduced in parous samples by assessing the effects of one of these genes, WNT4. As WNT4 from HR+ luminal cells has been shown to signal to basal cells (25), we performed co-immunostaining for the WNT effector TCF7 and basal cell marker p63 and found that TCF7 expression was markedly decreased in parous samples (Fig. 4E). This decrease was not due to differences in epithelial architecture, as TCF7 staining in ducts versus TDLUs within the same samples was unchanged (fig. S11E). Together, these data demonstrate that transcriptional variation among HR+ luminal cells is primarily related to hormone signaling, that transcription along this axis (HR+ metagene 8) is reduced in women with prior history of pregnancy, and that these transcriptional changes coincide with a reduction in downstream paracrine signaling to basal cells.

Identification of coordinated changes in signaling states across cell types in the breast

The above results established that parity was associated with a change in the per-cell hormone signaling of HR+ luminal cells. As the global effects of ER/PR activation in the breast are controlled by paracrine signaling from HR+ luminal cells to other cell types, we reasoned that hormone receptor activation in HR+ luminal cells would be linked to transcriptional changes in other cell types representing the downstream paracrine response. To identify putative transcriptional signatures of the paracrine response, we developed a computational framework that leverages the person-to-person transcriptional heterogeneity observed within cell types to find coordinated changes in cell signaling states across samples. First, we decomposed each cell type into a set of distinct gene expression programs, or “metagenes”, using NMF as described above (fig. S10, C and D, and fig. S12, A and B). We then quantified the average expression of each metagene for each sample and constructed a weighted network of coordinated gene expression programs based on the pair-wise Pearson correlations between metagenes (fig. S12C, methods). Finally, we identified modules of highly correlated gene expression programs using the infomap community detection algorithm (43). Using this approach, we identified three major modules—annotated here as “resting state”, “paracrine signaling”, and “involution” modules— comprising highly interconnected transcriptional states across cell types in the breast (Fig. 5A).

Fig. S12.
  • Download figure
  • Open in new tab
Fig. S12. Matrix decomposition analysis of secretory luminal cells, basal/myoepithelial cells, and fibroblasts.

(A) Parameter selection for non-negative matrix factorization based on KL divergence plots (methods). (B) Heatmap of cell loadings across each metagene for the indicated cell types. (C) Left: Network graph of coordinated gene expression programs in the human breast. Nodes represent distinct metagenes in the indicated cell types, and edges connect highly correlated metagenes (Pearson correlation coefficient > 0.5 and p < 0.05). Modules of highly correlated gene expression programs were identified using the infomap community detection algorithm. The “hormone signaling” metagene in HR+ cells (HR+ metagene 8) is highlighted in red. Right: Heatmap depicting Pearson correlation coefficients between all metagenes.

Fig. 5.
  • Download figure
  • Open in new tab
Fig. 5. Identification of coordinated changes in signaling states across cell types in the breast.

(A) Left: Network graph of correlated gene expression programs in the human breast. Nodes represent distinct metagenes in the indicated cell types, and edges connect highly correlated metagenes (Pearson correlation coefficient > 0.5; p < 0.05). Modules of highly correlated gene expression programs were identified using the infomap community detection algorithm. The “hormone signaling” metagene in HR+ cells (HR+ metagene 8) is highlighted in red. Right: Heatmap depicting Pearson correlation coefficients between metagenes in the three major modules (Resting state, Paracrine signaling, Involution). (B) Linear regression analysis of basal cell state across metagene 10 (paracrine response) versus HR+ luminal cell state across metagene 8 (hormone signaling) (R2 = 0.57, p < 3e-6, Wald test). Dots represent the average expression of each metagene within a sample, colored by the proportion of HR+ luminal cells in the epithelium for that sample. (C) Summary of multiple linear regression analysis with three predictors: HR+ cell hormone signaling (HR+ metagene 8), the frequency of HR+ cells in the epithelium, and an interaction term representing the combined effects of HR+ signaling and frequency (Signaling × Frequency). (D) Ridge plots depicting the distribution of basal cell metagene 10 (paracrine response) expression across samples, and quantification of the average expression in obese (BMI > 30) versus non-obese (BMI ≤ 30) samples (n = 16 samples, p < 0.003, Mann-Whitney test). (E) Schematic depicting how parity and obesity lead to decreased hormone signaling in the breast through distinct mechanisms. Parity directly affects the per-cell hormone response in HR+ luminal cells, whereas BMI leads to a reduction in the proportion of HR+ luminal cells in the epithelium.

The “resting state” module consisted of gene expression programs that were anti-correlated with hormone signaling in HR+ luminal cells (Fig. 5A, and fig. S13A). Metagenes in this module were primarily enriched for pathways involved in ribosome biogenesis and mRNA processing (fig. S14B). The “paracrine signaling” module comprised gene expression programs that were positively correlated with hormone signaling in HR+ luminal cells (Fig. 5A, and fig. S14A). As expected based on the central role HR+ cells play in the response to estrogen and progesterone, the HR+ hormone signaling metagene (HR+ metagene 8) had the greatest influence on information flow within this module, as measured by betweenness centrality (fig. S14A). Our analysis revealed that high levels of hormone signaling in HR+ cells coincided with the emergence of a second transcriptional state—HR+ metagene 5—in a distinct subpopulation of HR+ luminal cells (fig. S14B). Marker analysis and gene set enrichment analysis demonstrated that HR+ metagene 5 was characterized by upregulation of a hypoxia gene signature and pro-angiogenic factors such as VEGFA and ANGPTL4 (fig. S14C). Interestingly, a previous study using microdialysis of healthy human breast tissue found that VEGF levels increased in the luteal phase of the menstrual cycle (44). As estrogen response elements have been identified in the untranslated regions of VEGFA (45), our results suggest that this increased expression may be, in part, a direct effect of hormone signaling to this subpopulation of HR+ cells.

Fig. S13.
  • Download figure
  • Open in new tab
Fig. S13. The “Resting state” module consists of metagenes negatively correlated with hormone signaling in HR+ luminal cells.

(A) Network subgraph of the “resting state” module, and heatmap depicting Pearson correlation coefficients between all metagenes and levels of significance (* p < 0.05, ** p < 0.01, *** p < 0.001). (B) Gene set enrichment analysis for the indicated metagenes, showing the top pathways identified from GO gene sets.

Fig. S14.
  • Download figure
  • Open in new tab
Fig. S14. The “Paracrine signaling” module consists of metagenes positively correlated with hormone signaling in HR+ luminal cells.

(A) Network subgraph of the “paracrine signaling” module, and heatmap depicting Pearson correlation coefficients between all metagenes and levels of significance (* p < 0.05, ** p < 0.01, *** p < 0.001). Right: Betweenness centrality for each metagene within the paracrine signaling module. (B) Left: Linear regression analysis of HR+ cell state across metagene 5 (“hypoxia”) versus metagene 8 (“hormone signaling”) (R2 = 0.37, p < 0.0004, Wald test). Dots represent the average expression of each metagene within a sample. Right: Scatter plot of HR+ cell expression of metagene 5 versus metagene 8. Dots represent the expression of each metagene within individual HR+ luminal cells. (C) Left: Heatmap depicting the top 20 genes expressed in each HR+ cell metagene, highlighting metagene 5. Right: Gene set enrichment analysis for HR+ metagene 5, showing the top pathways identified from the Molecular Signatures Database Hallmark and GO gene sets. (D) Left: Heatmap depicting the top 20 genes expressed in each secretory luminal cell metagene, highlighting metagene 8. Right: Gene set enrichment analysis for secretory cell metagene 8, showing the top pathways identified from the Molecular Signatures Database Hallmark and GO gene sets. (E) Gene set enrichment analysis of secretory luminal cell metagene 8, showing enrichment of genes upregulated during the luteal phase of the menstrual cycle (29) (NES = 2.38, p < 2e-30). (F) Left: Heatmap depicting the top 20 genes expressed in each basal cell metagene, highlighting metagene 10. Right: Gene set enrichment analysis for basal cell metagene 10, showing the top pathways identified from the Molecular Signatures Database Hallmark, GO, and Canonical Pathways gene sets. (G) Left: Heatmap depicting the top 20 genes expressed in each fibroblast metagene, highlighting metagenes 5 and 7. Right: Gene set enrichment analysis for fibroblast metagenes 5 and 7, showing the top pathways identified from the Molecular Signatures Database Hallmark and GO gene sets.

We next investigated gene expression programs in other epithelial and stromal populations that correlated with hormone signaling in HR+ luminal cells. NMF and network analysis identified a subpopulation of proliferative secretory luminal cells within the paracrine signaling module (fig. S12B, and fig. S14D). This “proliferation” metagene was highly enriched for cell-cycle related genes previously found to be upregulated during the luteal phase of the menstrual cycle (fig. S14E) (29). Moreover, similar to HR+ cells, basal/myoepithelial cells in samples with high levels of hormone signaling had enrichment of transcripts involved in hypoxia and angiogenesis such as VEGFA and ANGPTL4 (fig. S14F). Gene set enrichment analysis demonstrated that variation across this basal cell “paracrine response” metagene was driven by genes involved in epithelial-mesenchymal transition, cell motility, and extracellular matrix (ECM) organization (fig. S14F), suggesting that changes in actomyosin contractility and cell-ECM interactions underlie the previously reported morphological changes observed in the breast epithelium across the menstrual cycle (46). Finally, previous studies have identified alterations in stromal organization and ECM composition across the menstrual cycle (47, 48). Consistent with this, hormone signaling in HR+ luminal cells correlated with two distinct gene expression programs in fibroblasts: a “tissue remodeling” metagene characterized by upregulation of ECM proteins including collagens (COL3A1, COL1A1, COL1A2) and fibronectin (FN1), and a “proinflammatory” gene expression program representing upregulation of cytokines and growth factors such as IL6 and TGFB3 (fig. S14G).

Finally, gene set enrichment analysis of the third module uncovered a transcriptional signature in HR+ and secretory luminal cells that was similar to that identified during post-lactational involution (fig. S15, A and B) (49, 50). These “involution” metagenes were characterized by high expression of death receptor ligands such as TNFSF10 (TRAIL) and genes involved in the defense and immune response, including interferon-response genes (fig. S15, B and C). The involution signature in secretory luminal cells was also characterized by expression of major histocompatibility complex class II (MHCII) molecules and the phagocytic receptors CD14 and MARCO (fig. S15B), suggesting that these cells play a role as nonprofessional phagocytes in the clearance of apoptotic cells, similar to what has been described during involution (51). Previous data have demonstrated that the fraction of apoptotic cells in the mammary epithelium peaks between the late luteal and early follicular phases of the menstrual cycle (52). Notably, TGFB3 signaling is a major signaling molecule involved in post-lactational involution that enhances phagocytosis by mammary epithelial cells (53), suggesting that TGFB3 secreted by fibroblasts at the end of the luteal phase (fig. S14G) activates a subset of secretory luminal cells during the late luteal/early follicular phase that go on to express “involution” markers including phagocytic receptors.

Fig. S15.
  • Download figure
  • Open in new tab
Fig. S15. The “Involution” module consists of metagenes enriched for genes upregulated during post-lactational involution.

(A) Network subgraph of the “involution” module, highlighting the two metagenes most closely associated with an “involution-like” gene signature. (B) Heatmap depicting the top 20 genes expressed in each HR+ cell or secretory cell metagene, highlighting “involution-like” gene signatures. Right: Gene set enrichment analysis of the indicated metagenes, showing enrichment of genes upregulated during the postlactational involution (70) (HR+ metagene 3: NES = 1.65, p < 0.008; Secretory cell metagene 2: NES = 1.97, p < 2e-6). (C) Gene set enrichment analysis for the indicated metagenes, showing the top pathways identified from the Molecular Signatures Database Hallmark and GO gene sets.

Together, these results demonstrate how the underlying sample-to-sample variability in scRNAseq data can be used to infer cell-cell communication networks. Using this computational framework, we find that paracrine signaling from HR+ luminal cells is a driver of transcriptional variability across all major cell types in the breast. Strikingly, many of these changes closely mimic those seen during the pregnancy/involution cycle that have been linked to a transient increased breast cancer risk following pregnancy (54–56).

The proportion of HR+ luminal cells predicts basal cell paracrine signaling state

Previously, we demonstrated that parity was associated with a change in the per-cell hormone signaling response of HR+ luminal cells (Fig. 4D), whereas increased BMI was associated with a reduction in the proportion of HR+ cells in the luminal compartment (Fig. 3). As the effects of ER/PR activation are controlled by paracrine signaling from HR+ luminal cells to other cell types, we reasoned that the overall proportion of HR+ luminal cells in the epithelium was a second mechanism that could affect the hormone responsiveness of the breast. While the downstream effects of hormone receptor activation in HR+ luminal cells are controlled by a complex set of signaling networks, previous work has shown that HR+ cells signal directly to basal cells via WNT (25). Since WNT proteins generally form short-range signaling gradients (57), we predicted that the paracrine signaling response in basal cells would be particularly sensitive to reductions in the proportion of HR+ luminal cells. Consistent with this idea, while the basal cell “paracrine response” metagene was linearly associated with the hormone signaling state of HR+ luminal cells (R2 = 0.57, p < 3e-6), positive outliers tended to have a greater proportion of HR+ luminal cells and negative outliers tended to have a lower proportion of HR+ luminal cells in the epithelium (Fig. 5B).

To formally test this prediction, we modeled the basal cell paracrine response as a linear response to three variables: HR+ cell hormone signaling, the frequency of HR+ cells in the epithelium, and an interaction term representing the combined effects of HR+ signaling and frequency (Signaling × Frequency). This combined model accounted for over 75% of the sample-to-sample variation across the paracrine response metagene in basal cells (Fig. 5C, fig. S16A, and table S6; p < 3e-8). Importantly, only the interaction term (Signaling × Frequency) was a significant predictor of basal cell transcriptional state (Fig. 5C and table S6), demonstrating that the basal cell paracrine response requires both hormone signaling in HR+ cells and an appreciable abundance of HR+ cells in the epithelium. Together, these results are consistent with a model in which the proportion of HR+ luminal cells in the epithelium influences the magnitude of paracrine signaling to basal cells downstream of estrogen and progesterone.

Fig. S16.
  • Download figure
  • Open in new tab
Fig. S16. Paracrine signaling to basal cells depends on the hormone signaling state of HR+ luminal cells and the proportion of HR+ luminal cells in the epithelium.

(A) Plot depicting the observed basal cell state across metagene 10 (“paracrine response”) for each sample versus the predicted values based on multiple linear regression analysis with three predictors: HR+ cell hormone signaling (HR+ metagene 8), the frequency of HR+ cells in the epithelium, and an interaction term representing the combined effects of HR+ cell signaling and frequency (Signaling × Frequency). (B) Ridge plots depicting the distribution of HR+ cell metagene 8 (“hormone signaling”) expression across samples, and quantification of the average expression in obese (BMI > 30) versus non-obese (BMI ≤ 30) samples (n = 16 samples, p < 0.31, Mann-Whitney test). (C) Ridge plots depicting the distribution of basal cell metagene 10 (“paracrine response”) expression across samples, and quantification of the average expression in nulliparous (NP) versus parous (P) samples (n = 22 samples, p < 0.003, Mann-Whitney test). (D) Volcano plot highlighting genes downregulated in basal/myoepithelial cells in both parous and obese samples.

Based on these results, we predicted that BMI would influence paracrine signaling from HR+ luminal cells to basal cells, since HR+ luminal cells are reduced in obese women (Fig. 3). Confirming this, we found that while direct hormone signaling in HR+ cells was not significantly affected by obesity (fig. S16B), the downstream basal cell paracrine response was significantly reduced in obese samples (Fig. 5D). Consistent with the reduced hormone signaling previously observed in HR+ cells from parous women (Fig. S4D), parity was also associated with a reduction in the basal cell paracrine response (Figs. S16C).

Gene set enrichment analysis demonstrated that variation in the basal cell “paracrine signaling” metagene was driven by genes involved in contractility and cell motility (fig. S14F). To determine whether these genes were differentially expressed in obese and/or parous women, we generated a “pseudo-bulk” dataset of basal cells from each sample. Of the 195 genes significantly downregulated in parous samples and 148 genes significantly downregulated in obese samples, 68 were reduced across both groups (fig. S16D and table S7). Both parous and obese samples had decreased expression of contractility-related genes including ACTA2, ACTG2, CNN1, MYH11, MYL9, and MYLK, as well as the basement membrane proteins COL4A1 and COL14A1 (fig. S16D and table S7). Finally, consistent with the idea that parity and obesity reduce the paracrine response of basal cells to hormone signaling, expression of the WNT target genes SPP1 and WLS were also reduced in both subsets. Overall, these results are consistent with a model in which parity and BMI affect the hormone responsiveness of the breast through two distinct mechanisms: parity directly alters the hormone signaling response in HR+ luminal cells, whereas BMI indirectly affects hormone signaling by reducing the proportion of HR+ luminal cells in the mammary epithelium (Fig. 5E).

Discussion

In this study, we combine single-cell analyses, immunostaining, and computational modeling to understand the major sources of sample-to-sample heterogeneity in the human breast. Importantly, by using single-cell measurements, we were able to separate out the effects of variation in cell proportions from variation in transcriptional state. Second, we describe a computational pipeline that leverages the inter-sample transcriptional heterogeneity in our dataset to identify coordinated changes in cell signaling states across cell types. Using this approach, we identify a set of highly correlated gene expression programs representing the in situ response to hormone receptor activation in HR+ cells and downstream signaling in other cell types. Furthermore, we show that person-to-person heterogeneity in hormone responsiveness in the breast is directly linked to two factors known to modulate premenopausal breast cancer risk—reproductive history and BMI.

Pregnancy has a pronounced protective effect against breast cancer, with up to a 50% reduction in breast cancer risk for women with multiple full-term pregnancies at a young age (8). Our analysis revealed that parity is associated with a stark increase in the proportion of basal and/or myoepithelial cells within the breast epithelium. Previous work has described two tumor-protective features of myoepithelial cells: they are highly resistant to malignant transformation (58–61) and also act as a natural and dynamic barrier that prevents tumor cell invasion (62, 63). Thus, our data suggest that pregnancy protects against breast cancer risk both by decreasing the relative frequency of luminal cells—the tumor cell-of-origin for most breast cancer subtypes (58, 64, 65)—and by suppressing progression to invasive carcinoma.

Hormone exposure is another major determinant of breast cancer risk (1, 66–68). Here, we use matrix decomposition and network analysis to map the coordinated changes in cell state that occur in response to paracrine signaling from HR+ luminal cells. Strikingly, many of these changes closely mimic those seen during the pregnancy/involution cycle that have been linked to a transient increased breast cancer risk following pregnancy (54–56). First, we identify a proliferative gene signature in secretory luminal cells that is highly correlated with hormone signaling in HR+ luminal cells, consistent with previous studies demonstrating that RANKL and WNT control progesterone-mediated epithelial proliferation (69). Second, previous studies have shown that the fraction of apoptotic cells in the epithelium peaks between the late luteal and early follicular phases (52). Consistent with this, we identify subpopulations of HR+ and secretory luminal cells in the cycling premenopausal breast enriched for genes known to be upregulated during post-lactational involution (50, 70). Notably, we also observe upregulation of hypoxic gene signatures in multiple epithelial and stromal cell types that are highly correlated with hormone signaling in HR+ cells. A previous study identified these same pathways as highly enriched following involution in the mouse mammary gland. More importantly from the perspective of breast cancer risk, this “hypoxia/pro-angiogenic” signature identified breast cancers with increased metastatic activity (70), suggesting that these pathways support a permissive tumor microenvironment.

Finally, we find that paracrine signaling from HR+ cells to basal cells depends on both the per-cell transcriptional response of HR+ cells to hormones and the overall proportion of HR+ cells in the epithelium. Notably, prior pregnancy and obesity are specifically associated with a reduced risk of ER+/PR+ breast cancer in premenopausal women (11, 40), and our data support the idea that these factors lead to reduced paracrine signaling downstream of estrogen and progesterone via two distinct mechanisms. First, parity leads to a reduced per-cell hormone signaling response in HR+ luminal cells.

Second, we identify a marked decrease in the ratio of HR+ cells relative to secretory luminal cells with increasing BMI. Both changes are associated with a reduced paracrine signaling response in basal cells.

In summary, these results provide a comprehensive, systems-level view of the cellular and transcriptional changes that control normal breast development and breast cancer risk in response to cycling hormones. This single-cell analysis establishes a link between hormone signaling tumor-promoting changes in cell state across multiple cell types. Furthermore, we identify tumor-protective changes in epithelial cell proportions and hormone responsiveness with pregnancy and increased body mass. As the breast is one of the only human organs that undergoes repeated cycles of morphogenesis and involution, this study serves as a roadmap to the cell state changes associated with hormone dynamics in the human breast. Finally, it provides a foundation for similar systems-level studies dissecting the how the paracrine communication networks downstream of hormone signaling are altered during ER+/PR+ breast cancer progression.

Author Contributions

L.M.M., R.J.W., and Z.J.G. conceived the project. L.M.M., J.C., R.J.W., C.S.M., and K.P. performed the sequencing experiments. C.S.M. generated aligned reads and barcode matrices, and performed sample demultiplexing. P.G. and J.C. coordinated sample acquisition and provided critical guidance for sample selection. P.G. and J.C. performed sectioning for fluorescent immunohistochemistry experiments. L.M.M. performed fluorescent immunohistochemistry and RNA-FISH experiments. L.M.M. and J.C. performed flow cytometry experiments. A.D.B. performed histopathology on tissue sections. L.M.M. analyzed and visualized the data. M.T. provided critical guidance in data analyses and computational approaches. T.T. and A.D.B. provided critical guidance in human breast biology. T.T., M.T., and Z.J.G. provided critical resources. T.A.D., M.T., T.T., and Z.J.G. supervised the project. L.M.M. and Z.G. wrote the manuscript. All authors reviewed and edited the manuscript.

Materials and Methods

Tissue samples and preparation

Reduction mammoplasty tissue samples were obtained from the Cooperative Human Tissue Network (CHTN, Vanderbilt University Medical Center, Nashville, TN) and Kaiser Permanente Northern California (KPNC, Oakland, CA). Core biopsy samples were provided by the Susan G. Komen Tissue Bank (KTB). Tissues were obtained as de-identified samples and all subjects provided written informed consent. When possible, medical reports or other patient data were obtained with personally identifiable information redacted. Use of breast tissue specimens to conduct the studies described above were approved by the UCSF Committee on Human Research under Institutional Review Board protocols 16-18865 and 10-01532. A portion of each sample was fixed in formalin and paraffin-embedded using standard procedures. The remainder was dissociated mechanically and enzymatically to obtain epithelial-enriched tissue fragments. Tissue was minced, followed by enzymatic dissociation with 200 U/mL collagenase type III (Worthington CLS-3) and 100 U/mL hyaluronidase (Sigma H3506) in RPMI 1640 with HEPES (Corning 10-041-CV) plus 10% (v/v) dialyzed FBS, penicillin, streptomycin, amphotericin B (Lonza 17-836E), and gentamicin (Lonza 17-518) at 37 °C for 16 h. For KTB samples, the cell suspension containing single cells and stroma was frozen and maintained at −180 °C until use. For reduction mammoplasty samples, the cell suspension was centrifuged at 400 x g for 10 min and resuspended in RPMI 1640 plus 10% FBS. Digested tissue fragments enriched for epithelial cells and associated stroma were collected after serial filtration through 150 μm and 40 μm nylon mesh strainers. Following centrifugation, tissue fragments and filtrate were frozen and maintained at −180 °C until use.

Dissociation to single cells

The day of sorting, epithelial-enriched tissue fragments from the 150 μm fraction, or total banked material for the KTB samples, were thawed and digested to single cells by trituration in 0.05% trypsin for 2 min, followed by trituration in 5 U/mL dispase (Stem Cell Technologies 07913) plus 1 mg/mL DNase I (Stem Cell Technologies 07900) for 2 min. Single-cell suspensions were resuspended in HBSS supplemented with 2% FBS, filtered through a 40 μm cell strainer, and pelleted at 400 x g for 5 min. The pellets were resuspended in 10 mL of complete mammary epithelial growth medium with 2% v/v FBS without GA-1000 (MEGM; Lonza CC-3150). Cells were incubated at 37 °C for 1 h, rotating on a hula mixer, to regenerate surface antigens.

MULTI-seq sample barcoding

Single-cell suspensions were pelleted at 400 x g for 5 min and washed once with 10 mL mammary epithelial basal medium (MEBM; Lonza CC-3151). For each sample, one million cells were aliquoted, washed a second time with 200 μL MEBM, and resuspended in 90 μL of a 200 nM solution containing equimolar amounts of anchor lipid-modified oligonucleotides (LMOs) and sample barcode oligonucleotides in phosphate buffered saline (PBS). Following a 5-minute incubation on ice with anchor-LMO/barcode, 10 uL of 2 μM co-anchor LMO in PBS was added to each sample (for a final concentration of 200 nM), and wells were mixed by gentle pipetting and incubated for an additional 5 min on ice. Following incubation, cells were washed twice in 200 μL PBS with 1% BSA and pooled together into a single 15 mL conical tube containing 10 mL PBS/1% BSA. All subsequent steps were performed on ice.

Sorting for scRNA-seq

Cells were pelleted at 400 x g for 5 min and resuspended in PBS/1% BSA at a concentration of 1 million cells per 100 μL, and incubated with primary antibodies. Cells were stained with Alexa 488-conjugated anti-CD49f to isolate basal/myoepithelial cells, PE-conjugated anti-EpCAM to isolate luminal epithelial cells, and biotinylated antibodies for lineage markers CD2, CD3, CD16, CD64, CD31, and CD45 to remove hematopoietic (CD16/CD64-positive), endothelial (CD31-positive), and leukocytic (CD2/CD3/CD45-positive) lineage cells by negative selection (Lin-). Sequential incubation with primary antibodies was performed for 30 min on ice in PBS/1% BSA, and cells were washed with cold PBS/1% BSA. Biotinylated primary antibodies were detected with a streptavidin-Brilliant Violet 785 conjugate. After incubation, cells were washed once and resuspended in PBS/1% BSA plus 1 ug/mL DAPI for live/dead discrimination. Cell sorting was performed on a FACSAria II cell sorter. Live singlet (DAPI-), luminal (DAPI-/Lin-/CD49f-/EpCAMhigh), myoepithelial (DAPI-/Lin-/CD49f+/EpCAMlow), or total epithelial (pooled luminal and myoepithelial) cells were collected for each sample as specified in table S2 and resuspended in PBS/1% BSA at a concentration of 1000 cells/μL. For Batch 4, an aliquot of MULTI-seq barcoded cells were separately stained with biotinylated-CD45/streptavidin-Brilliant Violet 785 to enrich for immune cells, and sorted CD45+ cells were pooled with the Live/singlet fraction as specified in table S2.

Antibodies and dilutions used (μL/million cells) were as follows: FITC-EpCAM (1.5 μL; BD 550257, clone AD2), APC-CD49f (4 μL; Stem Cell Technologies 10109, clone VU1D9), Biotin-CD2 (8 μL; Biolegend 313636, clone GoH3), Biotin-CD3 (8 μL; BD 55325, clone RPA-2.10), Biotin-CD16 (8 μL; BD 55338, clone HIT3a), Biotin-CD64 (8 μL; BD 555526, clone 10.1), Biotin-CD31 (4 μL; Invitrogen MHCD31154, clone MBC78.2), Biotin-CD45 (1 μL; Biolegend 304004, clone HI30), BV785-Streptavidin (1 μL; Biolegend 405249).

scRNAseq library preparation

cDNA libraries were prepared using the 10X Genomics Single Cell V2 (CG00052 Single Cell 3’ Reagent Kit v2: User Guide Rev B) or Single Cell V3 (CG000183 Single Cell 3’ Reagent Kit v3: User Guide Rev B) standard workflows as specified in table S2. Library concentrations were quantified using high sensitivity DNA Bioanalyzer chips (Agilent, 5067-4626), the Illumina Library Quantification Kit (Kapa Biosystems KK4824), and Qubit dsDNA HS Assay Kit (Thermo Fisher Q32851). Individual libraries were separately sequenced on a lane of a HiSeq4500 or NovaSeq, as specified in table S2, for an average of ~150,000 reads/cell.

Expression library pre-processing

Cell Ranger (10x Genomics) was used to align sequences, filter data and count unique molecular identifiers (UMIs). Data were mapped to the human reference genome GRCh37 (hg19). The resulting sequencing statistics are summarized in table S2. For each experimental batch, the cellranger aggr pipeline (10X Genomics) was used to normalize read depth across droplet microfluidic lanes.

Cell calling

For V2 experiments, cell-associated barcodes were defined using Cell Ranger. For V3/MULTI-seq experiments, cells were defined as barcodes associated with ≥600 total RNA UMIs and ≤20% of reads mapping to mitochondrial genes. We manually selected 600 RNA UMIs and 20% mitochondrial genes to exclude low-quality cell barcodes.

MULTI-seq barcode library pre-processing

Raw barcode FASTQs were converted to barcode UMI count matrices as described previously (16). Briefly, FASTQs were parsed to discard reads where: 1) the first 16 bases of read 1 did not match a list of cell barcodes generated as described above, and 2) the first 8 bases of read 2 did not align with any reference barcode with less than 1 mismatch. Duplicated UMIs, defined as reads with the same cell barcode where bases 17-26 (V2 chemistry) or bases 17-28 (V3 chemistry) of read 2 exactly matched, were removed to produce a final barcode UMI count matrix.

Sample demultiplexing

Barcode UMI count matrices were used to classify cells using the MULTI-seq classification suite (16). In Batch 3, sample RM192 was poorly labeled for the lane of cells from the epithelial cell sort gate. Therefore, to reduce spurious doublet calls in this dataset, we manually set UMI counts which were <10 for this barcode to zero. For all experiments, raw barcode reads were log2-transformed and meancentered, the top and bottom 0.1% of values for each barcode were excluded, and a probability density function (PDF) was constructed for each barcode. Next, all local maxima were computed for each PDF, and the negative and positive maxima were selected. To define a threshold between these two maxima, we iterated across 0.02-quantile increments and chose the quantile maximizing the number of singlet classifications, defined as cells surpassing the threshold for a single barcode. Multiplets were defined as cells surpassing two or more thresholds, and unlabeled cells were defined as cells surpassing zero thresholds. Unclassified cells were removed and the procedure was repeated until all remaining cells were classified.

To classify cells that were identified as unlabeled by MULTI-seq, we used the souporcell pipeline (15) to assign cells to different individuals based on single nucleotide polymorphisms (SNPs). For each dataset, we set the number of clusters (k) to the total number of samples in that experiment. To avoid local minima, souporcell restarts clustering multiple times and takes the solution that minimizes the loss function. For Batch 3, we chose the number of restarts that produced less than a 1.5% misclassification rate between MULTI-seq and souporcell singlet sample classifications (Live singlet: 30 restarts/1.2% mismatch rate; Epithelial: 75 restarts/1.5% mismatch rate). Souporcell classification performed more poorly across parameters for Batch 4 (Live singlet plus CD45+: 50 restarts/8.1% mismatch rate, 75 restarts/4.8% mismatch rate; Epithelial: 50 restarts/8.6% mismatch rate, 75 restarts/14.9% mismatch rate, 100 restarts/4.1% mismatch rate). Therefore, for these datasets we used sample classifications that were consistent across two restarts (Pooled live singlet/ CD45+: consistent calls across 50 and 75 restarts/0.4% overall mismatch rate; Epithelial: consistent calls across 50 and 100 restarts/1% overall mismatch rate) to identify high-confidence singlets.

Quality control, dataset integration, and cell type identification using Seurat

Cell type identification was performed using the Seurat package (version 3.0.0) in R (71). To identify doublets from the same sample that would not be identified by MULTI-seq or souporcell, we filtered each lane to remove cells with greater than 20% of reads mapping to mitochondrial genes and ran DoubletFinder (version 2.0) on each data subset (72), using parameters identified by the ‘paramSweep_v3’ function. Aggregated data for singlet cells for each batch was filtered to remove cells that had fewer than 200 genes and genes that appeared in fewer than 3 cells. Cells with a Z score of 4 or greater for the total number of genes expressed were presumed to be doublets and removed from analysis. The remaining cells were log transformed and scaled to a total of 1e4 molecules per cell, and the top 2000 most variable genes based on variance stabilizing transformation were identified for each batch (73). Data from all four batches were integrated using the standard workflow and default parameters from Seurat v3 (71). This data integration workflow identifies pairwise correspondences between cells across datasets and uses these anchors to transform datasets into a shared expression space. Following dataset integration, the resulting batch-corrected expression matrix was scaled, and principal component (PC) analysis was performed using the identified integration genes. The top 28 statistically significant PCs as determined by visual inspection of elbow plots were used as an input for UMAP visualization and k-nearest neighbor (KNN) modularity optimization-based clustering using Seurat’s FindNeighbors and FindClusters functions.

Quantification of sample-to-sample heterogeneity

To measure how well-mixed cells from different samples were across cell type clusters, we quantified the normalized relative entropy for our dataset, weighted by cluster size (74). A cluster entropy value of 1 represents complete intermixing of samples across clusters. To measure transcriptional variation in cell state within cell types between cells from the same versus different batches and/or samples, we measured the pairwise “alignment score” between each sample/batch (75), where batches consisted of sets of samples processed on the same day (table S2). This metric examines the local neighborhood of each cell in a particular sample/batch, asks how many of its k nearest neighbors belong to a second sample/batch, and averages this over all cells. The result was normalized by the expected number of cells from each sample/batch. Notably, for repeat measurements, samples run across multiple batches were highly similar.

Testing for changes in cell type proportions and predictive modeling

We modeled the detected number of each cell type in each sample as a random count variable using a quasi-Poisson process to allow for overdispersion, with the condition being tested (e.g. parity, obesity) as a predictor and the total number of detected epithelial or luminal cells in each sample as an offset variable (76). To account for uncertainty due to variable numbers of profiled cells in each sample, we used bootstrap resampling to estimate confidence intervals associated with detection of each cell type (77). Results from 1000 bootstrap replicates were pooled using the mice::pool function in R, and the model was fit using a quasi-Poisson generalized linear model from the ‘stats’ R package. Tests for statistical significance were performed using a Wald test on the regression coefficient. Multiple hypothesis correction was controlled using the false discovery rate. For the Komen Tissue Bank (KTB) data set, a quasi-Poisson model was trained on the reduction mammoplasty cohort as described above, and the ‘predict’ function in the ‘stats’ R package was used to predict the proportion of HR+ luminal cells in the KTB samples based on BMI.

PC analysis within HR+ luminal cells

To perform principal component analysis on HR+ luminal cells, we subset out this cluster from the integrated dataset and repeated the standard workflow from Seurat v3 to identify integration genes specific to this cell type. The resulting batch-corrected expression matrices were scaled, and PC analysis was performed using the identified integration genes.

Non-negative matrix factorization of individual cell types

To identify gene expression signatures, or “metagenes” within individual cell types, we subset out raw counts data from each of the four most abundant clusters (HR+ luminal cells, secretory luminal cells, basal cells, and fibroblasts) and performed matrix factorization. We chose to perform matrix factorization independently on each cell type rather than on the combined dataset, as preliminary analyses demonstrated that the number of metagenes identified for each cell type was highly dependent on the relative sizes of each cluster in the combined dataset. To account for batch differences, we used the LIGER package in R to perform integrative NMF {Welch:2019dz, Yang:2016fu}, and performed all subsequent analyses on shared, rather than batch-specific, metagenes. To avoid identification of gene signatures dominated by highly-expressed transcripts, we normalized the raw counts matrix for each cell based on its total expression, multiplied by a scale factor of 1e4, and log-transformed and scaled the result without centering. The resulting dataset was decomposed using the standard workflow and default parameters from LIGER. To estimate the optimum choice of rank K (i.e. number of NMF components) for each cell type, we used the suggestK function in the LIGER package to calculate the Kullback-Leibler (KL) divergence of metagene loadings across a range of K values, and identified the elbow point on this curve.

Jensen-Shannon distance to quantify sample-to-sample variability in hormone signaling

To quantify variation in expression of the “hormone signaling” metagene in HR+ luminal cells (HR+ metagene 8), we performed the following workflow. First, we used the cell loadings across HR+ metagene 8 for each sample to compute kernel density estimations using the ‘density’ function in the ‘stats’ R package. Second, we used the ‘JSD’ function in the ‘philentropy’ R package (78) to measure the pairwise Jensen-Shannon divergence between samples. Third, we converted this to a distance metric (Jensen-Shannon Distance, JSD) by taking the square root and performed hierarchical clustering using the ‘hclust’ function in the ‘stats’ R package, using ‘ward.D2’ linkage. The similarity between samples was plotted on a heatmap as (1-JSD).

Metagene network analysis

To identify sets of gene expression programs that co-varied across samples, we first decomposed each cell type into a set of distinct gene expression programs, or “metagenes”, using NMF as described above. We then quantified the average expression of each metagene in each sample and constructed a weighted network of coordinated gene expression programs based on the pair-wise Pearson correlations between metagenes. To account for correlations driven by outlier samples, we used bootstrap resampling to estimate confidence intervals associated with each correlation coefficient. The resulting Pearson correlation matrix was transformed into a weighted adjacency matrix by setting all Pearson correlation coefficients less than 0.5 or with p-values less than 0.05 to zero. Finally, we identified modules of highly correlated gene expression programs using the infomap community detection algorithm in the ‘igraph’ package in R (43). We chose this flow-based community detection algorithm in order to maximize information flow within clusters. Results using the modularity-based Louvain clustering algorithm were identical except that a small community consisting of three metagenes was merged with the “involution” module.

Gene set enrichment analysis

To identify marker genes statistically associated with each metagene, we used multiple least squares regression of normalized (z-scored) gene expression against the cell loading matrix for each metagene (79). This results in a vector of regression coefficients representing the strength of the relationship between expression of a particular metagene and scaled expression of each gene. The resulting ranked gene lists were analyzed by gene set enrichment analysis, using the ‘fgsea’ package in R (80).

Fluorescent Immunohistochemistry

For immunofluorescent staining, formalin-fixed paraffin-embedded tissue sections were deparaffinized and rehydrated using standard methods. Endogenous peroxides were blocked using 3% hydrogen peroxide in PBS, and antigen retrieval was performed in 0.1 M citrate buffer pH 6.0. Sections were blocked for 5 min at room temperature using Lab Vision Ultra-V block (Thermo TA-125-UB) and rinsed with TNT wash buffer (1X Tris-buffered saline with 5 mM Tris-HCl and 0.5% TWEEN-20). Primary antibody incubations were performed for 1 hour at room temperature or overnight at 4°C. Sections were washed three times for 5 min each with TNT wash buffer, incubated with Lab Vision UltraVision LP Detection System HRP Polymer (Thermo Fisher TL-060-HL) for 15 min at room temperature, washed, and incubated with one of three colors of TSA amplification reagent at a 1:50 dilution. After tyramide signal amplification, antibody complexes were removed by boiling in citrate buffer, followed by blocking and incubation with additional primary antibodies as above. Finally, sections were rinsed with deionized water and mounted using Vectashield HardSet Mounting Media with DAPI (Vector H-1400). Immunofluorescence was analyzed by spinning disk confocal microscopy using a Zeiss Cell Observer Z1 equipped with a Yokagawa spinning disk and running Zeiss Zen Software.

Antibodies, TSA reagents, and dilutions used are as follows: p63 (1:2000; CST 13109, clone D2K8X), KRT7 (1:4000; Abcam AB68459, clone EPR1619Y), KRT23 (1:2000; Abcam AB156569, clone EPR10943), ER (1:4000; Thermo Scientific RMM-9101-S, clone SP1), PR (1:3000; CST 8757, clone D8Q2J), TCF7 (1:2000; CST 2203, clone C63D9), FITC-TSA (2 min; Perkin Elmer NEL701A001KT), Cy3-TSA (3 min; Perkin Elmer NEL744001KT), Cy5-TSA (7 min; Perkin Elmer NEL745E001KT).

Morphometric analysis and geometric modeling

Formalin-fixed paraffin-embedded tissue sections were immunostained for the pan-luminal marker KRT7, counterstained with DAPI and imaged as described above. Images containing lobular tissue were acquired randomly, and the area and perimeter of the KRT7-positive luminal layer of each acinus was analyzed in ImageJ. To reduce noise and remove small gaps in KRT7 fluorescence, we applied a closing filter from the MorphoLibJ plugin with a 2-pixel (1.33 μm) radius disk (81). The resulting image was smoothed by applying a Gaussian filter with sigma 5 pixels (3.33 μm), and binarized using the default thresholding algorithm in ImageJ. Finally, individual acini with visible lumens were manually selected and the area (A), perimeter (P), and circularity of the KRT7-positive region was measured for each structure. To estimate the average diameter (d) and luminal thickness (w) of each acinus, we used area and perimeter measurements to fit a circle containing a hollow lumen to each structure. Based on these results, we implemented a geometric model in which each acinus was represented as a hollow circle with shell thickness that was linearly related to diameter (d). To estimate the linear relationship between w and d, we performed linear regression analysis using measurements from all structures with a circularity greater than 0.75 (n = 55 acini from 15 samples).

Pseudo-bulk differential gene expression analysis

To identify genes differentially expressed between samples from parous and nulliparous or obese and non-obese individuals in specific cell types, we constructed pseudo-bulk datasets consisting of the summed raw read counts across all single HR+ luminal cells or basal cells for each batch and sample. We restricted our analysis to samples/batches that had at least 100 cells of the cell type of interest. Each dataset was then randomly down-sampled to the lowest library size, and differential expression analysis was performed using DESeq2 (version 1.18.1) to test for genes differentially expressed between samples from obese (BMI ≥ 30) and non-obese (BMI < 30) or parous and nulliparous individuals, using batch as a covariate (82). As certain samples were sequenced across more than one batch (table S2), replicates of the same sample from different batches were combined using the collapseReplicates function. False discovery rate corrected p-values were calculated using the Benjamini-Hochberg procedure (83).

RNA FISH analysis of ESR1 transcripts

Combined RNA FISH and immunofluorescence analysis of estrogen receptor transcript (RNAscope Probe Hs-ESR1; ACD 310301) and protein (anti-ER; Thermo RMM-9101-S, clone SP1) was performed using the RNAscope in situ hybridization kit (RNAscope Multiplex Fluorescent Reagent Kit V2, ACD 323100) according to the manufacturer’s instructions and fluorescent immunohistochemistry protocol outlined above with the following modifications. Immunostaining for ER was performed prior to in situ hybridization, using the hydrogen peroxide and antigen retrieval solutions supplied with the RNAscope kit and the mildest recommended conditions. After ER immunostaining and tyramide signal amplification, in situ hybridization for ESR1 was performed according to the manufacturer’s instructions, followed by immunostaining for KRT7 as described above. For all RNA FISH experiments, we used positive (PPIB) and negative controls (DAPB) to verify staining conditions and probe specificity.

Data and code availability

Submission of raw gene expression and barcode count matrices to the Gene Expression Omnibus is in process. For inquiries contact authors.

Supplemental Tables

Table S1. Donor information for reduction mammoplasty samples and list of samples used for scRNAseq, FACS, and immunostaining experiments.

Table S2. Summary statistics for single-cell RNA sequencing of twenty-eight reduction mammoplasty samples and seven Komen Tissue Bank samples.

Table S3. Multiple linear regression analysis of the percentage of basal cells in the epithelium as measured by FACS.

Table S4. Association of the 20 highest-loading genes in PC1 for HR+ luminal cells with estrogen signaling, progesterone signaling, or the luteal phase of the menstrual cycle.

Table S5. Canonical hormone-responsive genes differentially expressed in HR+ luminal cells between parous and nulliparous samples.

Table S6. Multiple linear regression analysis of the basal paracrine response (metagene 10) in response to three predictors: HR+ cell hormone signaling (HR+ metagene 8), the frequency of HR+ cells in the epithelium, and an interaction term representing the combined effects of HR+ signaling and frequency (Signaling × Frequency)

Table S7. Genes differentially expressed in basal cells between parous versus nulliparous samples or obese (BMI >30) versus non-obese (BMI < 30) samples.

Acknowledgments

We thank Drs. Tom Norman and Jonathan Weissman for technical support and for generously providing access to equipment and computing resources. Sequencing was performed in the Center for Advanced Technology at UCSF. Tissue samples were provided by the Cooperative Human Tissue Network (CHTN), which is funded by the National Cancer Institute. Other investigators may have received specimens from the same subjects. Samples from the Susan G. Komen Tissue Bank at the IU Simon Cancer Center were used in this study. We thank contributors, including Indiana University who collected samples used in this study, as well as donors and their families, whose help and participation made this work possible. This research was supported in part by grants from the Department of Defense Breast Cancer Research Program (W81XWH-10-1-1023 and W81XWH-13-1-0221), NIH (U01CA199315 and DP2 HD080351-01), the NSF (MCB-1330864), and the UCSF Center for Cellular Construction (DBI-1548297), an NSF Science and Technology Center, to Z.J.G. Z.J.G is a Chan-Zuckerberg BioHub Investigator. L.M.M is a Damon Runyon Fellow supported by the Damon Runyon Cancer Research Foundation (DRG-2239-15).

Footnotes

  • This version of the manuscript represents our updated findings based on new analyses. All figures and text have been substantially revised.

References

  1. 1.↵
    Collaborative Group on Hormonal Factors in Breast Cancer, Menarche, menopause, and breast cancer risk: individual participant meta-analysis, including 118 964 women with breast cancer from 117 epidemiological studies. The Lancet Oncology. 13, 1141–1151 (2012).
    OpenUrlCrossRefPubMedWeb of Science
  2. 2.↵
    J. Russo, R. Rivera, I. H. Russo, Influence of age and parity on the development of the human breast. Breast Cancer Res. Treat. 23, 211–218 (1992).
    OpenUrlCrossRefPubMedWeb of Science
  3. 3.↵
    H. Nakshatri, M. Anjanappa, P. Bhat-Nakshatri, Ethnicity-Dependent and -Independent Heterogeneity in Healthy Normal Breast Hierarchy Impacts Tumor Characterization. Sci Rep. 5, 13526–14 (2015).
    OpenUrlCrossRefPubMed
  4. 4.↵
    K. A. Dunphy et al., Inter-Individual Variation in Response to Estrogen in Human Breast Explants. J Mammary Gland Biol Neoplasia. 25, 51–68 (2020).
    OpenUrl
  5. 5.↵
    S. Muenst et al., Pregnancy at early age is associated with a reduction of progesteroneresponsive cells and epithelial Wnt signaling in human breast tissue. Oncotarget. 8, 22353–22360 (2017).
    OpenUrl
  6. 6.↵
    T. Tanos et al., Progesterone/RANKL is a major regulatory axis in the human breast. Sci Transl Med. 5, 182ra55–182ra55 (2013).
    OpenUrlAbstract/FREE Full Text
  7. 7.↵
    M. Lambe et al., Transient increase in the risk of breast cancer after giving birth. N. Engl. J. Med. 331, 5–9 (1994).
    OpenUrlCrossRefPubMedWeb of Science
  8. 8.↵
    K. Britt, A. Ashworth, M. Smalley, Pregnancy and the risk of breast cancer. Endocr Relat Cancer. 14, 907–933 (2007).
    OpenUrlAbstract/FREE Full Text
  9. 9.↵
    G. K. Reeves et al., Cancer incidence and mortality in relation to body mass index in the Million Women Study: cohort study. BMJ. 335, 1134 (2007).
    OpenUrlAbstract/FREE Full Text
  10. 10.↵
    P. A. van den Brandt et al., Pooled analysis of prospective cohort studies on height, weight, and breast cancer risk. Am. J. Epidemiol. 152, 514–527 (2000).
    OpenUrlCrossRefPubMedWeb of Science
  11. 11.↵
    Premenopausal Breast Cancer Collaborative Group et al., Association of Body Mass Index and Age With Subsequent Breast Cancer Risk in Premenopausal Women. JAMA Oncol. 4, e181771 (2018).
    OpenUrl
  12. 12.↵
    R. T. Fortner et al., Parity, breastfeeding, and breast cancer risk by hormone receptor status and molecular phenotype: results from the Nurses’ Health Studies. Breast Cancer Res. 21, 40–9 (2019).
    OpenUrl
  13. 13.↵
    M. P. Cleary, M. E. Grossmann, Minireview: Obesity and breast cancer: the estrogen connection. Endocrinology. 150, 2537–2542 (2009).
    OpenUrlCrossRefPubMedWeb of Science
  14. 14.↵
    R. B. Clarke, A. Howell, C. S. Potten, E. Anderson, Dissociation between steroid receptor expression and cell proliferation in the human breast. Cancer Research. 57, 4987–4991 (1997).
    OpenUrlAbstract/FREE Full Text
  15. 15.↵
    H. Heaton et al., Souporcell: robust clustering of single-cell RNA-seq data by genotype without reference genotypes. Nat Meth. 17, 615–620 (2020).
    OpenUrl
  16. 16.↵
    C. S. McGinnis et al., MULTI-seq: sample multiplexing for single-cell RNA sequencing using lipid-tagged indices. Nat Meth. 30, 1 (2019).
    OpenUrl
  17. 17.↵
    Q. H. Nguyen et al., Profiling human breast epithelial cells using single cell RNA sequencing identifies cell diversity. Nature Communications. 9, 2028 (2018).
    OpenUrl
  18. 18.↵
    E. Lim et al., Transcriptome analyses of mouse and human mammary cell subpopulations reveal multiple conserved genes and pathways. Breast Cancer Res. 12, R21 (2010).
    OpenUrlCrossRefPubMed
  19. 19.↵
    S. Battersby, B. J. Robertson, T. J. Anderson, R. J. King, K. McPherson, Influence of menstrual cycle, parity and oral contraceptive use on steroid hormone receptors in normal breast. Br. J. Cancer. 65, 601–607 (1992).
    OpenUrlCrossRefPubMedWeb of Science
  20. 20.↵
    S. Peri et al., Defining the genomic signature of the parous breast. BMC Med Genomics. 5, 46 (2012).
    OpenUrlCrossRefPubMed
  21. 21.↵
    J. Santucci-Pereira et al., Genomic signature of parity in the breast of premenopausal women. Breast Cancer Res. 21, 1–19 (2019).
    OpenUrlCrossRef
  22. 22.↵
    F. Meier-Abt, H. Brinkhaus, M. Bentires-Alj, Early but not late pregnancy induces lifelong reductions in the proportion of mammary progesterone sensing cells and epithelial Wnt signaling. Breast Cancer Res. 16, 209 (2014).
    OpenUrlPubMed
  23. 23.↵
    R. Métivier et al., Estrogen receptor-alpha directs ordered, cyclical, and combinatorial recruitment of cofactors on a natural target promoter. Cell. 115, 751–763 (2003).
    OpenUrlCrossRefPubMedWeb of Science
  24. 24.↵
    L. N. Petz, A. M. Nardulli, Sp1 binding sites and an estrogen response element half-site are involved in regulation of the human progesterone receptor A promoter. Mol. Endocrinol. 14, 972–985 (2000).
    OpenUrlCrossRefPubMedWeb of Science
  25. 25.↵
    R. D. Rajaram et al., Progesterone and Wnt4 control mammary stem cells via myoepithelial crosstalk. The EMBO Journal. 34, 641–652 (2015).
    OpenUrlAbstract/FREE Full Text
  26. 26.↵
    M. C. Abba et al., Gene expression signature of estrogen receptor α status in breast cancer. BMC Genomics. 6, 1–13 (2005).
    OpenUrlCrossRefPubMed
  27. 27.
    A. Mackay et al., Molecular response to aromatase inhibitor treatment in primary breast cancer. Breast Cancer Res. 9, R37–14 (2007).
    OpenUrlCrossRefPubMed
  28. 28.
    A. K. Dunbier et al., Relationship between plasma estradiol levels and estrogen-responsive gene expression in estrogen receptor-positive breast cancer in postmenopausal women. J. Clin. Oncol. 28, 1161–1167 (2010).
    OpenUrlAbstract/FREE Full Text
  29. 29.↵
    I. Pardo et al., Next-generation transcriptome sequencing of the premenopausal breast epithelium using specimens from a normal human breast tissue bank. Breast Cancer Res. 16, R26 (2014).
    OpenUrlCrossRefPubMed
  30. 30.
    H. Hu et al., RANKL expression in normal and malignant breast tissue responds to progesterone and is up-regulated during the luteal phase. Breast Cancer Res. Treat. 146, 515–523 (2014).
    OpenUrlCrossRefPubMed
  31. 31.
    Y. Cordeaux, M. Tattersall, D. S. Charnock-Jones, G. C. S. Smith, Effects of medroxyprogesterone acetate on gene expression in myometrial explants from pregnant women. J. Clin. Endocrinol. Metab. 95, E437–47 (2010).
    OpenUrlCrossRefPubMed
  32. 32.
    C. Joyeux, H. Rochefort, D. Chalbos, Progestin increases gene transcription and messenger ribonucleic acid stability of fatty acid synthetase in breast cancer cells. Mol. Endocrinol. 3, 681–686 (1989).
    OpenUrlCrossRefPubMed
  33. 33.
    D. E. Haagensen, P. Stewart, W. G. Dilley, S. A. Wells, Secretion of breast gross cystic disease fluid proteins by T47D breast cancer cells in culture — modulation by steroid hormones. Breast Cancer Res. Treat. 23, 77–86 (1992).
    OpenUrlCrossRefPubMed
  34. 34.
    J. K. Richer et al., Differential gene regulation by the two progesterone receptor isoforms in human breast cancer cells. J. Biol. Chem. 277, 5209–5218 (2002).
    OpenUrlAbstract/FREE Full Text
  35. 35.↵
    W. R. Miller, A. Larionov, Changes in expression of oestrogen regulated and proliferation genes with neoadjuvant treatment highlight heterogeneity of clinical resistance to the aromatase inhibitor, letrozole. Breast Cancer Res. 12, R52–9 (2010).
    OpenUrlCrossRefPubMed
  36. 36.↵
    G. L. Stein-O’Brien et al., Enter the Matrix: Factorization Uncovers Knowledge from Omics. Trends Genet. 34, 790–805 (2018).
    OpenUrlCrossRef
  37. 37.
    J. D. Welch et al., Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity. Cell. 177, 1873–1887.e17 (2019).
    OpenUrlCrossRefPubMed
  38. 38.
    Z. Yang, G. M. Bioinformatics, 2016, A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data. academic.oup.com.
  39. 39.↵
    A. Liberzon et al., The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 1, 417–425 (2015).
    OpenUrlPubMed
  40. 40.↵
    H. Ma, L. Bernstein, M. C. Pike, G. Ursin, Reproductive factors and breast cancer risk according to joint estrogen and progesterone receptor status: a meta-analysis of epidemiological studies. Breast Cancer Res. 8, 3232 (2006).
    OpenUrl
  41. 41.↵
    F. Meier-Abt et al., Parity induces differentiation and reduces Wnt/Notch signaling ratio and proliferation potential of basal stem/progenitor cells isolated from mouse mammary epithelium. Breast Cancer Res. 15, R36 (2013).
    OpenUrlCrossRefPubMed
  42. 42.↵
    P. Kastner et al., Two distinct estrogen-regulated promoters generate transcripts encoding the two functionally different human progesterone receptor forms A and B. The EMBO Journal. 9, 1603–1614 (1990).
    OpenUrlCrossRefPubMedWeb of Science
  43. 43.↵
    G. Csardi, T. N. InterJournal, C. systems, 2006, The igraph software package for complex network research. researchgate.net.
  44. 44.↵
    C. Dabrosin, Variability of vascular endothelial growth factor in normal human breast tissue in vivo during the menstrual cycle. J. Clin. Endocrinol. Metab. 88, 2695–2698 (2003).
    OpenUrlCrossRefPubMedWeb of Science
  45. 45.↵
    S. M. Hyder, Z. Nawaz, C. Chiappetta, G. M. Stancel, Identification of functional estrogen response elements in the gene coding for the potent angiogenic factor vascular endothelial growth factor. Cancer Research. 60, 3183–3190 (2000).
    OpenUrlAbstract/FREE Full Text
  46. 46.↵
    R. Ramakrishnan, S. A. Khan, S. Badve, Morphological changes in breast tissue with menstrual cycle. Mod. Pathol. 15, 1348–1356 (2002).
    OpenUrlCrossRefPubMedWeb of Science
  47. 47.↵
    J. E. Ferguson, A. M. Schor, A. Howell, M. W. Ferguson, Changes in the extracellular matrix of the normal human breast during the menstrual cycle. Cell Tissue Res. 268, 167–177 (1992).
    OpenUrlCrossRefPubMedWeb of Science
  48. 48.↵
    G. Hallberg, E. Andersson, T. Naessén, G. E. Ordeberg, The expression of syndecan-1, syndecan-4 and decorin in healthy human breast tissue during the menstrual cycle. Reprod. Biol. Endocrinol. 8, 35 (2010).
    OpenUrlPubMed
  49. 49.↵
    T. Stein et al., Involution of the mouse mammary gland is associated with an immune cascade and an acute-phase response, involving LBP, CD14 and STAT3. Breast Cancer Res. 6, R75–91 (2004).
    OpenUrlCrossRefPubMedWeb of Science
  50. 50.↵
    R. W. E. Clarkson, M. T. Wayland, J. Lee, T. Freeman, C. J. Watson, Gene expression profiling of mammary gland development reveals putative roles for death receptors and immune mediators in post-lactational regression. Breast Cancer Res. 6, R92–109 (2004).
    OpenUrlCrossRefPubMedWeb of Science
  51. 51.↵
    J. Monks, C. Smith-Steinhart, E. R. Kruk, V. A. Fadok, P. M. Henson, Epithelial cells remove apoptotic epithelial cells during post-lactation involution of the mouse mammary gland. Biol. Reprod. 78, 586–594 (2008).
    OpenUrlCrossRefPubMedWeb of Science
  52. 52.↵
    T. J. Anderson, D. J. Ferguson, G. M. Raab, Cell turnover in the “resting” human breast: influence of parity, contraceptive pill, age and laterality. Br. J. Cancer. 46, 376–382 (1982).
    OpenUrlPubMedWeb of Science
  53. 53.↵
    J. Fornetti et al., Mammary epithelial cell phagocytosis downstream of TGF-β3 is characterized by adherens junction reorganization. Cell Death Differ. 23, 185–196 (2016).
    OpenUrlCrossRefPubMed
  54. 54.↵
    T. R. Lyons et al., Postpartum mammary gland involution drives progression of ductal carcinoma in situ through collagen and COX-2. Nat Med. 17, 1109–1115 (2011).
    OpenUrlCrossRefPubMed
  55. 55.
    J. O’Brien et al., Alternatively activated macrophages and collagen remodeling characterize the postpartum involuting mammary gland across species. Am. J. Pathol. 176, 1241–1255 (2010).
    OpenUrlCrossRefPubMedWeb of Science
  56. 56.↵
    P. Schedin, J. O’Brien, M. Rudolph, T. Stein, V. Borges, Microenvironment of the Involuting Mammary Gland Mediates Mammary Cancer Progression. J Mammary Gland Biol Neoplasia. 12, 71–82 (2007).
    OpenUrlCrossRefPubMedWeb of Science
  57. 57.↵
    H. F. Farin et al., Visualization of a short-range Wnt gradient in the intestinal stem-cell niche. Nature. 530, 340–343 (2016).
    OpenUrlCrossRefPubMed
  58. 58.↵
    P. J. Keller et al., Defining the cellular precursors to human breast cancer. Proc. Natl. Acad. Sci. U.S.A. 109, 2772–2777 (2012).
    OpenUrlAbstract/FREE Full Text
  59. 59.
    T. A. Proia et al., Genetic predisposition directs breast cancer phenotype by dictating progenitor cell fate. Cell Stem Cell. 8, 149–163 (2011).
    OpenUrlCrossRefPubMedWeb of Science
  60. 60.
    S. Koren et al., PIK3CA(H1047R) induces multipotency and multi-lineage mammary tumours. Nature Publishing Group. 525, 114–118 (2015).
    OpenUrl
  61. 61.↵
    A. Van Keymeulen et al., Reactivation of multipotency by oncogenic PIK3CA induces breast tumour heterogeneity. Nature Publishing Group. 525, 119–123 (2015).
    OpenUrl
  62. 62.↵
    M. D. Sternlicht, P. Kedeshian, Z. M. Shao, S. Safarians, S. H. Barsky, The human myoepithelial cell is a natural tumor suppressor. Clin. Cancer Res. 3, 1949–1958 (1997).
    OpenUrlAbstract
  63. 63.↵
    O. K. Sirka, E. R. Shamir, A. J. Ewald, Myoepithelial cells are a dynamic barrier to epithelial dissemination. J. Cell Biol. 217, 3368–3381 (2018).
    OpenUrlAbstract/FREE Full Text
  64. 64.↵
    L. Melchor et al., Identification of cellular and genetic drivers of breast cancer heterogeneity in genetically engineered mouse tumour models. The Journal of Pathology. 233, 124–137 (2014).
    OpenUrlCrossRefPubMedWeb of Science
  65. 65.↵
    G. Molyneux et al., BRCA1 basal-like breast cancers originate from luminal epithelial progenitors and not from basal stem cells. Cell Stem Cell. 7, 403–417 (2010).
    OpenUrlCrossRefPubMedWeb of Science
  66. 66.↵
    E. F. Beaber et al., Recent oral contraceptive use by formulation and breast cancer risk among women 20 to 49 years of age. Cancer Research. 74, 4078–4089 (2014).
    OpenUrlAbstract/FREE Full Text
  67. 67.
    V. Beral, Million Women Study Collaborators, Breast cancer and hormone-replacement therapy in the Million Women Study. Lancet. 362, 419–427 (2003).
    OpenUrlCrossRefPubMedWeb of Science
  68. 68.↵
    L. S. Mørch et al., Contemporary Hormonal Contraception and the Risk of Breast Cancer. N. Engl. J. Med. 377, 2228–2239 (2017).
    OpenUrlCrossRefPubMed
  69. 69.↵
    P. A. Joshi et al., RANK Signaling Amplifies WNT-Responsive Mammary Progenitors through R-SPONDIN1. STEMCR. 5, 31–44 (2015).
    OpenUrl
  70. 70.↵
    T. Stein, N. Salomonis, D. S. A. Nuyten, M. J. van de Vijver, B. A. Gusterson, A mouse mammary gland involution mRNA signature identifies biological pathways potentially associated with breast cancer metastasis. J Mammary Gland Biol Neoplasia. 14, 99–116 (2009).
    OpenUrlCrossRefPubMedWeb of Science
  71. 71.↵
    T. Stuart et al., Comprehensive Integration of Single-Cell Data. Cell. 177, 1888–1902.e21 (2019).
    OpenUrlCrossRefPubMed
  72. 72.↵
    C. S. McGinnis, L. M. Murrow, Z. J. Gartner, DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors. Cell Syst. 8, 329–337.e4 (2019).
    OpenUrl
  73. 73.↵
    C. Hafemeister, R. Satija, Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol. 20, 296–15 (2019).
    OpenUrlCrossRefPubMed
  74. 74.↵
    N. Barkas et al., Joint analysis of heterogeneous single-cell RNA-seq dataset collections. Nat Meth. 16, 695–698 (2019).
    OpenUrl
  75. 75.↵
    A. Butler, P. Hoffman, P. Smibert, E. Papalexi, R. Satija, Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
    OpenUrlCrossRefPubMed
  76. 76.↵
    A. L. Haber et al., A single-cell survey of the small intestinal epithelium. Nature Publishing Group. 551, 333–339 (2017).
    OpenUrl
  77. 77.↵
    Y. Cao et al., scDC: single cell differential composition analysis. BMC Bioinformatics. 20, 721–12 (2019).
    OpenUrl
  78. 78.↵
    H. D. J. O. O. S. Software, 2018, Philentropy: information theory and distance quantification with R. joss.theoj.org.
  79. 79.↵
    D. Kotliar et al., Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-Seq. eLife. 8, 507 (2019).
    OpenUrlCrossRef
  80. 80.↵
    G. Korotkevich, V. Sukhov, A. Sergushichev, Fast gene set enrichment analysis. bioRxiv. 10, 060012 (2019).
    OpenUrl
  81. 81.↵
    D. Legland, I. Arganda-Carreras, P. Andrey, MorphoLibJ: integrated library and plugins for mathematical morphology with ImageJ. Bioinformatics. 32, 3532–3534 (2016).
    OpenUrlCrossRefPubMed
  82. 82.↵
    M. I. Love, W. Huber, S. Anders, Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550–21 (2014).
    OpenUrlCrossRefPubMed
  83. 83.↵
    Y. Benjamini, Y. Hochberg, Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society: Series B (Methodological). 57, 289–300 (1995).
    OpenUrlCrossRefPubMedWeb of Science
View Abstract
Back to top
PreviousNext
Posted October 21, 2020.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Changes in epithelial proportions and transcriptional state underlie major premenopausal breast cancer risks
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Changes in epithelial proportions and transcriptional state underlie major premenopausal breast cancer risks
Lyndsay M Murrow, Robert J Weber, Joseph Caruso, Christopher S McGinnis, Kiet Phong, Philippe Gascard, Alexander D Borowsky, Tejal A Desai, Matthew Thomson, Thea Tlsty, Zev J Gartner
bioRxiv 430611; doi: https://doi.org/10.1101/430611
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
Changes in epithelial proportions and transcriptional state underlie major premenopausal breast cancer risks
Lyndsay M Murrow, Robert J Weber, Joseph Caruso, Christopher S McGinnis, Kiet Phong, Philippe Gascard, Alexander D Borowsky, Tejal A Desai, Matthew Thomson, Thea Tlsty, Zev J Gartner
bioRxiv 430611; doi: https://doi.org/10.1101/430611

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Systems Biology
Subject Areas
All Articles
  • Animal Behavior and Cognition (2428)
  • Biochemistry (4786)
  • Bioengineering (3329)
  • Bioinformatics (14659)
  • Biophysics (6631)
  • Cancer Biology (5163)
  • Cell Biology (7418)
  • Clinical Trials (138)
  • Developmental Biology (4357)
  • Ecology (6869)
  • Epidemiology (2057)
  • Evolutionary Biology (9908)
  • Genetics (7342)
  • Genomics (9513)
  • Immunology (4546)
  • Microbiology (12662)
  • Molecular Biology (4938)
  • Neuroscience (28287)
  • Paleontology (199)
  • Pathology (804)
  • Pharmacology and Toxicology (1389)
  • Physiology (2021)
  • Plant Biology (4487)
  • Scientific Communication and Education (977)
  • Synthetic Biology (1297)
  • Systems Biology (3909)
  • Zoology (725)