High Throughput Transcriptomics to Understand Chemical Drivers of Aggressive Breast Cancer Subtypes

Background The impact of chemical exposures on breast cancer progression is poorly characterized and may influence the development of more severe and aggressive subtypes. Objectives There is a suite of toxicants, including metals, pesticides, and personal care product compounds, which are commonly detected at high levels in US Center for Disease Control’s National Health and Nutrition Examination Survey (NHANES) chemical biomarker screens. To characterize the impact of these toxicants on breast cancer pathways, we performed high throughput dose-response transcriptomic analysis of toxicant exposed breast cells. Methods We treated non-tumorigenic mammary epithelial cells, MCF10A, with 21 chemicals at four doses (25nM, 250nM, 2.5µM, 25µM) for 48 hours. We conducted RNA-sequencing for these 408 samples, adapting the PlexWell plate-based RNA-sequencing method to analyze changes in gene expression resulting from these exposures. For each chemical, we calculated gene and biological pathway specific benchmark doses using BMDExpress2, identifying differentially expressed genes and generating the best fit benchmark dose models for each gene. We employed enrichment testing to test whether each chemical’s upregulated or downregulated genes were over-represented in a biological process or pathway. We contextualized benchmark doses relative to human population biomarker concentrations in NHANES. Results Overall, significant changes in gene expression varied across doses of each chemical and benchmark dose modeling revealed dose-responsive alterations of thousands of different genes. Comparison of benchmark data to NHANES chemical biomarker concentrations indicated an overlap between actual exposure levels and levels sufficient to cause a gene expression response. Enrichment and cell deconvolution analyses showed benchmark dose responses correlated with changes in cancer and breast cancer related pathways, including induction of basal-like characteristics for some chemicals, including p,p’-DDE, lead, copper, and methyl paraben. Discussion These analyses revealed that these 21 chemicals induce significant changes in pathways involved in breast cancer initiation and progression at human exposure relevant doses.


Introduction
There are an estimated 2.3 million new cases of breast cancer diagnosed globally and approximately 685,000 people die from breast cancer each year . 1 Breast cancer is a heterogeneous disease -tumors can be grouped into subtypes based on their molecular signature and expression of estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2). 2 Prognosis and outcomes vary based on subtype. Triple negative breast cancers, those which do not express ER, PR, or HER2, have the worst overall survival rates. 3 Individuals diagnosed with metastatic breast cancer continue to have very poor survival, less than 30% after five years, regardless of subtype. 4 Identifying the factors which promote these aggressive breast cancers, such as metastatic and/or triple negative breast cancers, is of high public health priority. In general, high penetrance genetic risk factors are only thought to explain approximately 5-10% of all breast cancers. 5 As such, there is likely a large impact of the environment on aggressive breast cancer risk. There is an urgent need to identify potentially modifiable risk factors, such as chemical exposures, which promote aggressive breast cancers so that we can develop new strategies for prevention and targeted therapies.
As aggressive breast cancers form, they acquire the Hallmarks of Cancer, including sustained proliferative signaling, the activation of invasion and metastasis pathways, immortality, and the evasion of growth suppression, the reactivation of stem cell pathways, and the acquisition of cellular plasticity. [6][7][8] As an example, triple negative breast cancers and breast cancers which are more likely to metastasize are enriched for expression of stem cell associated pathways. [9][10][11][12][13] Triple negative breast cancers also often present with a basal-like phenotype. 14 Experimental data surprisingly suggest that these basal-like breast cancers develop from dysregulated luminal cells, where these cells acquire phenotypic plasticity and shift from a luminal-like to a basal-like expression pattern. 15,16 Identifying chemicals which impact the development of aggressive breast cancers through the promotion of these various hallmarks, including altered stemness and phenotypic plasticity pathways, would provide novel insights into the mechanisms by which environmental exposures impact breast cancer risk and outcomes.
There are hundreds or thousands of chemical exposures which could impact breast cancer risk, and a systematic and rapid method for evaluation is essential to identify those which may alter aggressive breast cancer hallmarks at human relevant doses. 5 Here, we developed a high throughput transcriptomic platform for the unbiased assessment of chemical effects in vitro. We quantified the dose-dependent effects of 21 chemical exposures in the commonly used in vitro breast carcinogenesis model, non-tumorigenic MCF10A cells. We prioritized the chemicals based on them being commonly detected in chemical biomarker screens, with documented exposure disparities across demographic groups, and with putative links to breast cancer. 5,[17][18][19][20][21][22][23] Across over 400 RNA-sequencing reactions, we characterize the dose-response effects of these chemicals on individual genes, breast cancer hallmarks, and on cellular plasticity phenotypes. We contextualize our dose-response data with respect to detectable biomarker concentrations in a representative sample of the US population. These results highlight putative chemical contributors to aggressive breast cancers which occur at human relevant doses and provide a methodological framework for the evaluation of chemical exposures on biological pathways associated with aggressive breast cancers.
with 5% CO 2 for 24 hours prior to dosing. Cells were then exposed to test chemicals for 48 hours with 4 biological replicates per dose. Experiments were run in 2 batches to include experimental replicates for all chemical-dose combinations.
plexWell RNA-Sequencing Overview: To conduct high throughput RNA-sequencing, we developed an adapted version of the plexWell (SeqWell, Beverly, MA, USA) plate-based sequencing method to make the method applicable for high throughput in vitro toxicological analysis. In the plexWell protocol, sequencing reactions are prepared in custom 96 well plates, where each well contains a unique oligo barcode. Plates are also assigned a unique oligo barcode. Cells are lysed and cDNA is directly prepared from the cellular lysate. A tagmentation reaction adds the well specific barcode to the cDNA from each well. The tagmented cDNAs are then pooled prior to library preparation, where plate specific barcodes can also be incorporated for ultrahigh sample multiplexing prior to sequencing. For example, 384 wells of treated samples would be processed across 4 custom 96 well plates, where each of the samples receives a well specific barcode and each of the 4 plates receives a plate specific barcode. This leads to each of the 384 samples receiving a unique combination of well and plate barcodes, which can be bioinformatically deconvoluted following sequencing. This protocol decreases the cost of library preparation by approximately 90% relative to traditional RNA-sequencing, enabling unbiased high throughput toxicological analyses.
Our key modification to the manufacturer protocol to streamline the process was to avoid the step of RNA purification prior to cDNA preparation. The plexWell protocol calls for the input of 1uL of RNA for cDNA preparation. To eliminate the step of RNA extraction, we devised a method of lysing the cells directly in the 384-well microplate. After 48 hours of incubation with the test chemicals, the cells were washed twice with 40uL of 1X HBSS. After the 2nd wash, 5uL of 1X HBSS was left behind, to avoid cell loss. We added 10uL of plexWell lysis buffer (with RNase inhibitor) to each well using a 2.5-125uL multi-channel micropipette. The buffer was pipetted up and down to mix 10 times and the cell lysates were collected in 1mL eppendorf tubes on ice to avoid RNA degradation. Cell lysis was confirmed via examination by microscopy.
plexWell cDNA preparation: The plexWell manufacturer's protocol was followed to prepare cDNA for subsequent library preparation. In brief, 1uL of cell lysate was used as an input for cDNA preparation and oligo dT annealing in a PCR 96 well plate (Dot Scientific, Burton, MI, USA). The cDNA synthesis reaction was run for 12 cycles on a Bio-Rad CFX96 Touch Real-Time PCR Detection System. Equivalent amount of MAGwise Paramagnetic Beads was added to the amplified cDNA in each well of the 96 well plate. The cDNA was allowed to bind for 5 minutes. The plate was placed on a magnet to allow the beads to pellet on the inner walls of each well. Supernantant was discarded and the beads were washed once using 80% ethanol. The cDNA was eluted using 20uL of 10mM Tris solution. Purified cDNA was stored at -20 o C for short term storage.
Picogreen assay: As per plexWell protocol, we quantified the cDNA concentration of a subset of samples using th eQuant-iT Picogreen dsDNA Assay kit (Thermo Fisher, Waltham, MA, USA).
Standards ranging from 25pg/mL to 25ng/mL were prepared using the Lambda Standard DNA (100ug/mL) provided in the kit. 1uL of prepared dsDNA was diluted in 99uL of 1X TE buffer to The pellet was washed two times with 80% Ethanol. 40uL of 10mM Tris was used to elute out purified SB reaction pool. Next, the purified SB reaction pool was labelled with an i5 index, referred to as a Pool Barcode, using a tagmentation reaction. Post i5 tagging, an equivalent amount of MAGwise Paramengnetic beads was added to the PB (Pool Barcode) reaction. The DNA was allowed to bind to the beads. The beads then formed a pellet on the magnetic stand and was washed two times using 80% ethanol. Purified PB product was amplified using Bio-Rad CFX96 Touch Real-Time PCR Detection System for 12 cycles. Post amplification, the total volume for the reaction was measured and the DNA was diluted to a total volume of 205uL with 10mM Tris. 200uL of the diluted product was transferred to a new 1.5mL LoBind tubes for purification. 5uL of the unpurified product was stored as control. 0.8 equivalents of MAGwise paramagnetic beads were added to the PB product and DNA was allowed to bind to the beads.
The beads were allowed to form a pellet on the inner wall of the tube. The pellet was washed two times using 80% ethanol. Purified library was eluted using 32uL of 10mM of Tris. 28uL of the purified product was transferred to a new 1.5mL of LoBind tube. 4 libraries were prepared, each having a unique Pool Barcode. Purified libraries were stored at -20 o C for short term storage.
Library QC was done on the Agilent Bioanalyzer (High Sensitivity DNA 5000 kit) at the Advanced Genomics Core.

RNA-Sequencing and Data Processing:
Libraries were sequenced at Illumina NovaSeq 6000 l at the Advanced Genomics Core. FASTQ reads were demultiplexed back into individual samples based on the i7 and i5 indices. Sequencing data were transferred to the University of Michigan Great Lakes high performance computing cluster for analysis. Sequencing read quality was assessed via FastQC and MultiQC. Reads were aligned to a splice junction aware build of the human genome (GRCh38) using STAR. Aligned reads were assigned to genes using featureCounts, where multimapping and multi-overlapping reads were not counted.

Differential Gene Expression Testing:
Read count matrices from featureCounts were loaded into edgeR, and samples with fewer than 1,000,000 mapped reads were excluded from downstream analysis. Genes with low expression were excluded from analysis using the edgeR filterByExpr() function with default settings. Normalization factors and dispersion were calculated prior to generating a log2-transformed normalized counts per million (cpm) matrix for downstream analysis. Differential gene expression between each treatment and the relevant controls was calculated using quasi-likelihood negative binomial generalized log-linear modeling in edgeR. Genes were considered differentially expressed between a treatment and control at a false discovery rate (FDR) adjusted p-value less than 0.05.

Normalized counts per million reads were imported into BMDExpress and prefiltered using
One-Way ANOVA for significance at a p-value of < 0.05 to identify genes showing significant increasing or decreasing concentration responses. To determine concentration-response relationships, the filtered data were modeled in BMDExpress with hill, power, linear, polynomial (2°, 3°, 4°), and exponential (2, 3, 4, 5) models and the best fit model was chosen. The benchmark response (BMR) was set to 1 standard deviation relative to control response, maximum iterations of the model was set to 250, a confidence level of 0.95 was used, and a power was restricted to greater than or equal to 1 for the power model. Best fit models were chosen for each dose response relationship per gene using nested chi square to select for the best model followed by lowest Akaike information criterion (AIC). Hill models were flagged if its 'k' parameter was smaller than the lowest positive dose and flagged hill models were included in the data. Benchmark concentrations with values above the highest noncytotoxic dose (25 µM) were excluded from further analysis. We conducted "Defined Category" analyses to additionally assess the dose-response relationship to the Molecular Signature Database "Hallmark" gene sets, embryonic stem cell genes, breast cancer molecular signatures, and breast stem cell genes. [24][25][26][27] All original expression data, ANOVA filtered results, BMD results, and subsequent BMDExpress analysis for all chemicals are included in a .bm2 file available in supplementary information.
For hypergeometric enrichment testing of BMD data, we used the R package HypeR. 28 29,30 Background gene expression included all human genes (23467). Results were filtered for significance at an FDR of < 0.05 and visualized by heatmap using heatmap.2 function from the R package gplots.

Comparison of NHANES exposure levels to benchmark doses:
We compared bioactive doses of the assayed chemicals to human biomarker concentrations measured as part of the National Health and Nutrition Examination Survey following our previously established methodology. 21 NHANES is designed to understand the health status of the US population, including a substantial number of chemical biomarker measurements in urine, blood, and serum. For this analysis, we used our curated NHANES dataset, which contains information on chemical biomarker concentrations of the 21 assessed chemicals in up to 57786 women recruited from 1999-2018. 31 Linkage between chemicals used in vitro and their corresponding exposure biomarkers in NHANES are shown in Supplemental Table 2

Secondary analysis of normal breast single cell data for cell type composition prediction:
To quantify whether treatments altered the cellular composition of our samples, we performed bioinformatic deconvolution of our bulk RNA-seq data based on a single cell atlas of the normal human mammary gland (Pal et al., 2021), where 10 normal mammary gland samples were processed for single cell analysis using the 10x Genomics Chromium platform. 29 Sample specific count matrices of the single cell RNA-seq data were downloaded from the Gene Expression Omnibus and processed via Seurat. Briefly, we excluded cells which express fewer than 200 or more than 5000 genes, along with cells with greater than 15% mitochondrial reads. Data were log-normalized and scaled, and then clustered with 0.02 resolution, following Pal et al's methods, to identify four cell clusters. Clusters were then assigned to "luminal progenitor", "mature luminal", and "myoepithelial" identities based on marker gene expression. Gene expression signatures for each of the cell types were calculated using the FindAllMarkers() function in Seurat.
We then used these single cell gene expression data to estimate cell type proportions in our data

Results
To understand the impacts of exposure disparities chemicals on dysregulated breast cancer biology, we treated non-tumorigenic MCF10A cells with chemicals found through previous analysis of human population biomonitoring data that represent exposures from metals, pesticides, personal care products, and other industrial and environmental sources, including per-and polyfluoroalkyl substances. 19,21 After filtering expression data for genes with significant alterations, fold change and significance of gene expression varied substantially by chemical and by dose ( Figure 1, Table 1 Genes related to breast cancer were commonly found to be differentially expressed by this panel of chemicals at various doses. Of note, G protein alpha subunit (GNAS) , a gene related to breast cancer cell proliferation and migration, expression changes were observed across 28 exposures. 32 Additionally, aldo-keto reductases AKR1C1, AKR1C2, and AKR1C3 expression changed across 30, 38, and 6 exposures, respectively, and these genes are associated with hormone signaling and breast cancer proliferation. 33 Keratins KRT14, KRT15, and KRT17 expression levels also changed across 17, 9, and 18 exposures, and their dysregulation is associated with breast and other cancers. [34][35][36] Serine protease inhibitor, clade E member 1 (SERPINE1), a gene associated with TNBC chemotherapy resistance, also showed changes in expression across 12 chemical treatments. 37 To characterize the dose-response relationship of each exposure on gene expression, we performed benchmark dose modeling, to define the range of benchmark doses for the different chemicals ( Figure 2). Best benchmark doses were estimated from a set of dose-response models for each gene for each chemical (Figure 2A). The total number of genes filtered by ANOVA and fit to benchmark models was different for each chemical, with some chemicals (arsenic, cadmium, and thiram) inducing dose-responsive alterations of over 1000 different genes.
Dose-response fit was assessed via examination of the best fit dose-response curves for individual gene/chemical combinations. Two example gene BMD curves are displayed in Figure   2  We next wanted to quantify the biological processes altered by chemical exposure and characterize the doses at which these processes are dysregulated. BMD modeled genes were assessed using MSigDB's Hallmark gene sets consisting of 50 biological processes. The supplemental bm2 file contains benchmark dose gene data for every hallmark category for each chemical organized under "functional classifications." Figure 4A displays an accumulation plot depicting the median BMD at which Hallmark gene sets were significantly impacted. The accumulation plot demonstrates low dose effects present for many of the Hallmarks. Figure 4B shows a heatmap of significantly changed hallmark processes -9 of the chemicals showed significant enrichment or downregulation for 1 or more of the Hallmark gene sets. Notably, many of these processes are changes that are related to carcinogenesis, including the epithelial mesenchymal transition, inflammatory response, and reactive oxygen species pathway. Epithelial to mesenchymal transition was upregulated by arsenic, cadmium, copper, lead, and thiram at median BMD dose of 7.58 µM, 9.83 µM, 0.60 µM, 0.55 µM, and 9.01 µM, respectively. To assess potential breast cancer specific alterations regulated by our chemicals of interest, we tested for enrichment with a publicly available breast carcinogenesis gene panel, BCScreen. 30 The overlapping genes within different categories of breast carcinogenesis are shown in supplemental table 4. The gene panel includes 500 genes total divided into 14 categories, with 33 genes per category other than the "mammary" category which includes 71 genes. Of these, arsenic, cadmium, and thiram showed BMD gene alterations in all 14 categories, and copper, lead, and p,p'-DDE showed at least one gene alteration in most categories.
We next wanted to test the hypothesis that disparities associated chemicals induce changes consistent with a luminal-to-basal transition, a process likely important in the development of basal breast cancers. We performed cell type deconvolution based on a normal breast single cell RNA-seq reference to predict the proportions of myoepithelial, mature luminal, and luminal progenitor cells in each of our exposures ( Figure 5 and Supplemental Figures 2-4). 29 Interestingly, PFOA showed a significant decrease in myoepithelial cell proportion from around 65% to 52% across 3 doses (Supplemental Figure 2). Arsenic, copper, and lead showed a significant increase in the proportion of myoepithelial cells across all doses compared to control, while p,p'-DDE showed a significant increase in this proportion at the 3 highest doses ( Figure 5).
These data suggest that these chemicals induce an expression signature consistent with a luminal-to-basal transition, a characteristic of aggressive basal-like breast cancers.

Discussion
Across all 21 chemicals representing exposures commonly affecting women in the United States, nearly 30,000 differentially expressed genes were found dysregulated in exposed breast cells compared to controls. While many of these alterations were found at the highest doses, lower dose effects were also seen, with median BMDs falling near the range of NHANES biomarker levels for all chemicals. This suggests that the gene expression effects seen in this study are representative of effects seen in humans, and so our analysis may link chemical exposures to incidence and characteristics of breast cancer. Interestingly, of the chosen set of chemicals, only nine have been classified by IARC as possible, probable, or definite carcinogens to humans, making this study both novel and relevant in assessing underlying chemical influence on the biology of cancers.
Specific gene alterations included some commonalities across chemical treatments showing expression changes in genes related to breast cancer progression. This included GNAS, aldo-keto reductases, keratins, and SERPINE1. GNAS has been implicated in breast cancer cell proliferation and epithelial to mesenchymal transition via modulation of the PI3K/AKT/Snail1/E-cadherin signaling axis. 32 Aldo-keto reductases are involved in estrogen and progesterone production and are thus involved in pro-proliferative signaling of hormone responsive tumors. 33 Dysregulation of keratin expression levels has also been associated with breast cancer, with increased KRT14 expression associated with invasiveness, decreased KRT15 expression associated with poor prognosis, and increased KRT17 associated with the development of TNBC. 34-36 SERPINE1 is associated with resistance of TNBC to paclitaxel and knockdown of SERPINE1 reversed resistance of breast cancer to paclitaxel through downregulation of vascular endothelial growth factor A (VEGFA). 37 In addition to common gene expression changes across chemical treatments, our enrichment analyses suggest that chemicals alter breast cell biology by several distinct mechanisms. Metals including arsenic, cadmium, copper, and lead were found to induce epithelial to mesenchymal transition, inflammatory response, and reactive oxygen species pathways, consistent with previous literature on metal-induced carcinogenesis that places particular emphasis on oxidative stress related carcinogenesis. [38][39][40] In breast cancer, arsenic and cadmium have been associated with oxidative stress related pathways of malignant transformation. 41 While these metals as well as many of the other chemicals in our analysis are known to play an endocrine disrupting role that could lead to breast tumorigenesis, these findings propose alternative routes of carcinogenesis. 42-44 p,p'-DDE downregulated epithelial to mesenchymal transition in our study but altered cell cycle processes. Previous literature has found that a mixture of organochlorine pesticides including p,p'-DDE had potential to modulate proliferation of breast cancer cell lines partially through induction of cell cycle entry. 45 Alterations in stemness features were also seen across eight chemicals analyzed, important because the cancer stem cell hypothesis states that there is a small subset of cancer stem cells that result in cancer initiation and invasion, and so alterations in stemness features can lead to increased tumorigenicity and metastasis. 46 We found stemness signatures enhanced by copper, lead, thiram, and p,p'-DDE, findings that could indicate novel mechanisms of chemically induced carcinogenesis.
These findings may be particularly relevant in the context of breast cancer disparities. We previously identified that in the US, non-Hispanic Black women are disproportionately exposed to many of the chemicals assayed here. 19 Non-Hispanic black women are also three times more likely to be diagnosed with triple negative breast cancers and approximately 40% more likely to die from a breast cancer diagnosed compared to white women. 47,48 Triple negative breast cancers often have a basal-like phenotype and may derive from a luminal to basal cell state transition. [14][15][16] Here using enrichment analysis, we found upregulation of basal subtype genes by lead and PFOA and cell type proportion analysis found upregulated basal proportions by arsenic, copper, lead, and p,p'-DDE and a decrease in basal cell proportions by PFOA. Previous literature has shown that arsenic is capable of inducting transition from luminal to basal features, but this association has not been studied for the other chemicals. 49 These findings highlight an increased need to examine chemical exposures as a modifiable risk factor for aggressive breast cancers, particularly in the context of breast cancer disparities.
One major limitation of the present study is that all exposures were done on an acute timescale (48 hours) while actual human exposures to the chemicals analyzed here are likely chronic, over the course of years or a lifetime. Another limitation is that only one cell type was used for this study, normal breast cells taken from a white woman. Ongoing work is considering treatment on a more chronic timescale as well as different breast cell lines representing diverse donors.
Additionally, future work could consider treatment of diverse breast cancer cell lines, ranging from more basal to more luminal cell types to investigate changes in luminal or basal features related to chemical exposure. Another limitation in our analyses integrating benchmark dose levels with NHANES biomarker concentrations is that the NHANES chemical biomarkers are measured in urine and blood. These levels may or may not reflect mammary tissue levels. As additional research is conducted on biomonitoring of mammary tissues, or physiologically based toxicokinetic models reflecting mammary gland concentrations are developed, these data and methods will substantially inform our understanding of how chemical exposures impact breast cancer risk.