Integrative data analysis predicts YY1 as a Cis-regulator in the 3D Cell Culture Models of MCF10A at the Stiffness Level of High Mammographic Density

Previous studies have shown that in 3D cell culture models of human mammary cells (HMEC) (i) colony organizations are heterogeneous, and (ii) ERBB2 is overexpressed in MCF10A when the stiffness of the microenvironment is increased to that of high mammographic density (MD). The goal of the current study is to identify transcription factors that regulate processes associated with the increased stiffness of the microenvironment. Two HMEC premalignant lines of MCF7 and 184A1 are cultured in 3D, colonies are imaged using confocal microscopy, and colony organizations and heterogeneity are quantified as a function of the stiffness of the microenvironment. In parallel and surrogate assays, colony organizations are profiled by transcriptomics. Transcriptome data are enriched by correlative analysis with the computed morphometric indices, from 3D culture, and a subset of transcriptome data is selected. This subset is then processed with Model-based Analysis of Regulation of Gene Expression (MARGE) and publicly available ChIP-seq data to predict regulatory transcription factors. The integrative analysis indicated that YY1 regulates ERBB2 in the 3D cell culture of MCF10A when the stiffness of the microenvironment is increased to that of high MD. Subsequent experimental validation confirmed that YY1 is only expressed at the high stiffness value of the microenvironment concomitant with the overexpression of ERBB2 in MCF10A. Furthermore, using ERBB2 positive SKBR3 cell line, co-expression of YY1 and ERBB2 is absent, which indicates that YY1 regulates tumorigenicity through multiple pathways. Author’s summary MCF10A is a premalignant immortalized human mammary cell that has been isolated from a patient with fibrocystic and lost several barriers toward transformation. In an earlier study, we showed that ERBB2 is upregulated in 3D cultures of MCF10A when the stiffness of the microenvironment is increased to that of high mammographic density. Here, we leverage publicly available ChIP-seq data to predict and validate the cis-regulator of ERBB2. Our integrated experimental and computation protocol provides a pathway for elucidating regulators that can potentially be targeted for intervention.

We aim to identify and validate key transcription factors (TFs) and functional enhancers that regulate 3 processes associated with the increased stiffness of the microenvironment in 3D cell culture models. 4 Toward this objective, we integrate surrogate readouts that are generated through imaging of 3D colony 5 formation with transcriptomic profiling data leading to a subset of the enriched gene set through correlative 6 analysis. 3D colony formations are imaged using confocal microscopy, where each colony is segmented 7 and represented multi-parametrically [1, 2] as a function of the increased stiffness of the microenvironment. 8 In a parallel assay, transcriptome data are also generated, under identical conditions, and genes that correlate 9 with computed colony (e.g., morphometric) indices are identified and enriched. Subsequently, the enriched 10 gene set is cross-referenced with publicly available ChIP-seq data for inferring regulatory transcription 11 factors. Although gene expression profiling, using microarray or RNA-seq techniques, can identify 12 differentially expressed genes between two or more conditions, the gene list is not sufficient to learn about 13 regulation. Existing methods such as ARACNE [3,4] use transcriptomic profiling to infer cis-regulatory 14 networks. However, these techniques require a large number of samples for expression profiling, and 15 indirect effects, such as secondary regulatory relations, between genes, cannot be distinguished from the 16 direct transcriptional regulatory events. ChIP-seq data measure genome-wide distributions of regulatory or 17 epigenetic factors (transcription factors or histone modifications) that elucidate direct interactions between 18 regulators and their target genes. By taking advantage of publicly available genomic data, including histone 19 modification ChIP-seq and chromatin accessibility DNase-seq data from various cell types, we can predict 20 the functional enhancer elements in the human genome. 21 22 We use a recently developed method called Model-based Analysis of Regulation of Gene Expression 23 (MARGE) [5] to infer the cis-regulatory network. MARGE uses a compendium of published ChIP-seq data 24 for H3K27ac (e.g., acetylation at 27th lysin residue of the Histone H3 protein) and applies a regression-25 based semi-supervised learning approach to predict the functional enhancers of a given gene set. any particular tissue type that we are studying, but can also apply to any cell system regardless of the 40 availability of epigenomic data in the compendium.

42
In a recent study, we examined 3D colony organization as a function of the stiffness of the extracellular 43 matrix (ECM) within the range of mammographic density (MD) [10]. The study concluded that ERBB2 is 44 4 overexpressed in the 3D cultures of MCF10A when the stiffness of the microenvironment is increased to 1 that of high MD in the clinical samples. The stiffness of the MD was measured on clinical histology samples 2 using atomic force microscopy, which varied between 250 to 1800Pa [10]. Overexpression of ERBB2 was 3 hypothesized by integrating computed morphometric indices of colony formation, from microscopy 4 readouts, with the matched gene expression data and then validated using immunofluorescence staining and 5 microscopy. Although we have shown that the relationship between ERBB2 and stiffness of the 6 microenvironment is specific to the 3D cell culture model and MCF10A, the regulators of ERBB2 are the 7 primary motivation of this study. The details of the experimental design, microscopy, and computational 8 analysis are included in the material and method section, i.e., supplementary files for online publication.  Supplementary Fig. 1 and 2. The first experiment was performed with 37 MCF10A, which indicated that (i) YY1 regulates ERBB2 at high stiffness, (ii) if YY1 is turned off with 38 CRISPR, then ERBB2 is never activated either at low or high stiffness values of the microenvironment, 39 and (iii) if YY1 is upregulated then ERBB2 is also activated even at low stiffness of the microenvironment. The following observations are made using the two human mammary epithelial cell lines of MCF10A 6 and 184A1. These two cell lines were selected because, at high stiffness, MCF10A expresses ERBB2 while 7 184A1 does not [10]. Furthermore, MCF10A originates from a patient with the fibrocystic disease, while 8 184A1 originates from the normal patient following reduction mammoplasty. We conclude that (I) 9 Increased stiffness of the microenvironment, within the range normal mammographic density, also In conclusion, integrative morphometric and transcriptomic data has enriched the molecular signature for 29 inference of the cis-regulatory networks. Our integrative analysis suggests that YY1 upregulates ERBB2 in 30 the 3D cell culture of MCF10A when the stiffness of the microenvironment is increased to that of high 31 mammographic density. In addition, YY1 is also highly enriched in 184A1, but not expressed within the 32 same range of the experimental conditions. This is possible because MCF10A is more differentiated and is 33 derived from a benign proliferative breast tissue, which is spontaneously immortalized without defined  by varying concentration of agarose, 1% agarose for low MD and 2.0% agarose for High MD. The initial 1 seeding density for 3D cell culture is 8000 cells per well. The procedure of preparing 3D culture was 2 described as following. The cells were carefully suspended in 40 μL Matrigel matrices (BD). Next, 40 μL 3 of 2.0% and 4% agarose solution (Sigma) were added and thoroughly mixed. The formed gels have final 4 concentrations of agarose at 1.0% and 2.0%. The corresponding stiffnesses were 250 and 1800 Pa, 5 respectively. The mixtures were placed in an 8-well chambered coverglass (Nunc Lab Tek II), and 6 incubated for 20 min, allowing each matrix to gel before adding the growth medium (500 μL/well). Cells 7 were maintained in culture for 8 days while changing the medium once every third day. Changing of the 8 medium was also monitored with phase microscopy for proper growth and colony formation.  was maintained for 1 h. In some cases, samples were stained for a specific molecular endpoint, but in all 39 cases, DAPI nuclear stain was applied for profiling colony organization. Monoclonal rabbit anti-human 40 primary antibodies (e.g., ERBB2 EP1045Y, and YY1 EPR4652) (Abcam) were diluted in blocking solution 41 (1:500), incubated for 1 h, and followed by 3 PBS washes. Goat anti-rabbit secondary antibody conjugated 42 with Alexa 488 and Alexa 565 (A11008 and A11011) (Life Science Technology) was diluted in the 43 blocking solution (1:250), conjugated to primary antibody for 1 h, and followed by 3 PBS washes. Stained samples were imaged using a Zeiss LSM 710 system equipped with a Zeiss Apochromat 40X/1.1 5 (0.8 mm WD) water immersion objective lens. The excitation filters were set at 405, 488 and 568 nm, and 6 the emission filters were set to receive signals between 420-480, 420-550 and 597-700 nm, respectively. 7 The laser intensity was set at 1%. A twin-gate main beam splitter with two wheels, each wheel having 10 8 filter positions (and thus 100 possible combinations), was used to separate excitation and emission beams. 9 The pinhole was set at "1". The digital gain was adjusted to approximately ¾ of the maximum gain, which 10 kept the dynamic range of the pixels between 500-2000 (12 bits). The 3D stack function of the Zen software 11 was used to collect raw 3D information for each colony and was concurrent with "Z correction" of the pixel 12 intensity for thick samples. The voxel size was set to 0.25µm×0.25µm×1µm, the image files were saved as  Consent to publish 20 All authors have read the manuscript and forward their consent to publication.

22
Competing interest 23 The authors declare no financial competition.

25
Funding 26 This work has been supported by NIH R01CA140663 (BP) and K22CA204439 (CZ).

28
Availability of data and materials 29 All gene expression data have been uploaded as a part of the Supplementary Materials. These include raw 30 data and correlative analysis.

32
Competing interest 33 The authors declare no financial competition.