Sequence-to-expression approach to identify etiological non-coding DNA variations in P53 and cMYC-driven diseases

Disease risk prediction based on DNA sequence and transcriptional profile can improve disease screening, prevention, and potential therapeutic approaches by revealing contributing genetic factors and altered networks. Despite identifying many disease-associated DNA variants through genome-wide association studies, distinguishing deleterious non-coding DNA variations remains poor for most common diseases. We previously reported that non-coding variations disrupting cis-overlapping motifs (CisOMs) of opposing transcription factors significantly affect enhancer activity. Analyzing publicly available ChIP-seq data for P53 and cMYC in human embryonic stem cells and mouse embryonic cells showed that ∼344-366 genomic regions are co-occupied by P53 and cMYC. We identified, on average, two CisOMs per region, suggesting that co-occupancy is evolutionarily conserved in vertebrates. Therefore, we designed in vitro experiments to uncover the significance of the co-occupancy and competitive binding and inhibition between P53 and cMYC on target gene expression. We found that treating U2OS cells with doxorubicin increased P53 protein level while reducing cMYC level. In contrast, no change in protein levels was observed in Raji cells. ChIP-seq analysis showed that 16-922 genomic regions were co-occupied by P53 and cMYC before and after treatment, and substitutions of cMYC signals by P53 were detected after doxorubicin treatment in U2OS. Around 187 expressed genes near co-occupied regions were altered at mRNA level according to RNA-seq data. We utilized a computational motif-matching approach to determine that changes in predicted P53 binding affinity by DNA variations in CisOMs of co-occupied elements significantly correlate with alterations in reporter gene expression. We performed a similar analysis using SNPs mapped in CisOMs for P53 and cMYC from ChIP-seq data in U2OS and Raji, and expression of target genes from the GTEx portal. We found a significant correlation between change in motif-predicted cMYC binding affinity by SNPs in CisOMs and gene expression. In conclusion, our study suggests a generally applicable approach to filter etiological non-coding variations associated with P53 and cMYC-dependent diseases. Author Summary Most DNA variants associated with common complex diseases fall outside the protein-coding regions of the genome, making them hard to detect and relate to a function. Although many computational tools are available for prioritizing functional disease risk variants outside the protein-coding regions of the genome, the precision of prediction of these tools is mostly unreliable and hence not close to cancer risk prediction. This study brings to light a novel way to improve prediction accuracy of publicly available tools by integrating the impact of cis-overlapping binding sites of opposing cancer proteins, such as P53 and cMYC, in their analysis to filter out deleterious DNA variants outside the protein-coding regions of the human genome. Using a biology-based statistical approach, DNA variants within cis-overlapping motifs impacting the binding affinity of opposing transcription factors can significantly alter the expression of target genes and regulatory networks. This study brings us closer to developing a generally applicable approach capable of filtering etiological non-coding variations in co-occupied genomic regions of P53 and cMYC family members to improve disease risk assessment.


Introduction
of all types of cancer), while cMYC is overexpressed in many types of cancer [24,30]. 168 CisOMs of P53 and cMYC, thereby are promising targets for detecting etiological 169 variants using a computational approach that can be expanded to include P53 and 170 cMYC family members. 171 We analyzed ChIP-seq data from published studies for P53 and cMYC, which 172 showed a similar number of co-occupied regions and CisOMs between human 173 embryonic stem cells and murine embryonic cells. To study CisOMs of P53 and cMYC 174 in the same cells, we used U2OS and Raji cancer cells untreated and treated with 175 doxorubicin. We used computational analysis of our ChIP-seq data to identify a 176 substantial number of P53 and cMYC co-occupied regions before and after treatment. 177 RNA-seq analysis showed the differently expressed genes (DEGs) after treatment with 178 doxorubicin and we bioinformatically identified the DEGs that are close to co-occupied 179 regions by P53 and cMYC. We further propose a biology-based sequence-to-180 expression statistical approach capable of assessing etiological non-coding DNA 181 variations. We mapped putative regulatory elements with important molecular features 182 bound by P53 and cMYC, including the number of binding sites within 100bp, CisOMs 183 and binding affinity of each site. We also included luciferase assay data of regulatory 184 elements that contain SNPs within CisOMs for P53 and cMYC in the statistical 185 approach model. Therefore, this study is an important step toward developing a 186 statistical approach or a filter to uncover etiological DNA variation for essential 187 transcriptional factors, such as P53 and cMYC, that determine cell fate and are often 188 dysregulated in cancer [24,30]. 189

190
Analysis of publicly available ChIP-seq data of P53 and cMYC 191 genomic binding in similar cell types 192 193 To identify co-occupied genomic regions containing CisOMs for P53 and cMYC, 194 we initially used previously published ChIP-seq studies that experimentally mapped P53 195 and cMYC binding regions in similar cell types. We analyzed the ChIP-seq data for P53 196 and cMYC in human embryonic stem cells (hESC) and murine embryonic cells (mECs) 197 ( Fig 1A). We performed the analysis for 4968 P53 peaks from Akdemir, Jain The co-occupied regions contain a total of 926 CisOMs for cMYC and P53, and a total 206 of 69 SNPs within CisOMs of mouse genome. Notably, 31 genes are potentially 207 regulated by P53 and cMYC co-occupied regions with CisOMs in both human and 208 mouse embryonic cells (Fig 1B). According to gene ontology analysis using DAVID [34], 209 these genes are involved in apoptosis, DNA damage repair, kinases, phosphatases, cell 210 cycle inhibitors, and cMYC pathway. Occupancy at regulatory elements is cell-type and protein-level dependent. 215 Therefore, we sought to determine the dynamics of the competitive binding between 216 P53 and cMYC in two cancer cell types. We used the U2OS osteosarcoma cells and 217 Raji Burkitt's lymphoma cells because previous ChIP-seq data were conducted in these 218 cell lines. Optimized experiments were conducted in U2OS and Raji cell lines to identify 219 genomic cMYC and P53 binding signals before and after treatment with doxorubicin 220 (Dox), a DNA-damaging drug (Fig 2A). The total protein level of cMYC and P53 were 221 measured in cells pooled from multiple culture plates before and after treatment with 222 Dox to assess the protein amount ( Fig 2B). The level of cMYC and P53 did not change 223 in Raji cells after Dox treatment. On the other hand, the level of P53 was remarkably 224 increased, and that of cMYC was reduced in U2OS after Dox treatment. We then used 225 the pooled cancer cells untreated and treated with Dox for the RNA-seq and ChIP-seq 226

analysis. 227
Transcriptional profile before and after treatment with 228 doxorubicin 229 230 RNA-seq was performed to determine the differentially expressed genes (DEGs) 231 after treatment with doxorubicin in U2OS and Raji cells. The transcriptional profile aims 232 to link the DEGs to the co-occupied genomic regions by P53 and cMYC in U2OS and 233 Raji cells (Fig 3A and B). We used three biological replicates for each cell line before 234 and after treatment. The RNA-seq was conducted for 12 samples, and the 235 transcriptional profile was generated for each sample to determine homogeneity among 236 biological replicates (S1 Fig). Due to heterogeneity, we excluded one of the three 237 biological replicates of U2OS-Dox treated cells. After Dox treatment, around 470 genes 238 were upregulated and 173 genes were downregulated in U2OS cells, while 304 genes 239 were upregulated and 599 genes were downregulated in Raji cells (Table 1). Gene 240 ontology analysis of the DEGs in U2OS showed that these genes are involved in cell 241 cycle, nuclear and cell division, DNA damage response, mitotic metaphase progression, 242 and sister chromatid cohesion (Table 2). Similarly, gene ontology analysis of DEGs in 243 Raji cells showed that these genes are involved in cell division, apoptosis, cell cycle 244 regulation, DNA damage and inflammatory response, nuclear division, and negative 245 regulation of transcription from RNA polymerase II promoter (Table 3). 246    of four different binding scenarios. First, in some cases, we saw an enrichment of P53 265 genomic binding signals due to Dox treatment in Raji cells (Fig 4A, dashed box). 266 Second, increased cMYC signals was detected in a few regions around the ROCK1P1 267 intronic region in U2OS cells, however, the signal intensity was reduced (Fig 4B,  268 dashed box). Third, there were cases where there was no change in intensity of P53 269 signals in U2OS cells before and after treatment with Dox, and there was a variable 270 increase in P53 signals in Raji cells as well (Fig 4C, dashed box). Fourth, in some 271 cases, new P53 signals and replacement of cMYC signals can be detected in U2OS 272 cells within the same regions after treatment with Dox ( Fig 4D, dashed box). However, 273 these are not all possible scenarios of P53 and cMYC binding and enrichment. The 274 number of co-occupied regions was bioinformatically assessed under these conditions. 275  within CisOMs is large in the PLEC gene (~400 putative variants per region) ( Fig 5E). In 297 addition, these tentative DNA variants are enriched in Chr8 a treatment ( Fig 5F). 298 Similarly, in U2OS cells after treatment with doxorubicin, there is an enrichment in the 299 number of tentative DNA variants (> 200 putative variants per region) in the HGH1 gene 300 ( Fig 5G). Notably, there is an increased number of variants in chr7 and chr8 ( Fig 5H). 301

304
We decided to assess 100 bp regions around the identified CisOMs as potentially 305 minimal regulatory elements containing a SNP within CisOMs for P53 and cMYC to 306 generate a statistical modeling approach. An illustrative diagram shows the position of 307 the SNPs within CisOMs of co-occupied elements by P53 and cMYC (Fig. 6A). 308 Luciferase assay data from previous [18, 21] and the current study was used as a 309 readout of the P53 and cMYC co-occupied element activity. For the statistical approach, 310 the selected SNP is located at the center of the element within the CisOMs for P53 and 311 cMYC ( Fig 6A, black arrow). For each SNP, a "delta score" was defined as the variant 312 element-driven expression of luciferase activity minus the common/reference sequence 313 element-driven expression of luciferase activity, thus reflecting the expression impact of 314 the SNP. The delta-P53 scores were computationally estimated based on the change in 315 predicted binding strength ("delta binding") due to an SNP as the difference in motif 316 binding score of the variant and reference alleles ( Fig 6B, S2 Table). On average for the 317 ChIP-seq data, a total of about 9.59 cMYC and 6.98 P53 binding sites in Raji cells, and 318 8.5 cMYC and 10.5 P53 binding sites in U2OS, were mapped within 100 bp co-occupied 319 regions which were considered a minimal enhancer element (S1 Table). The correlation 320 coefficient observed between altered luciferase expression (delta-score) and disruption 321 of P53 binding affinity (delta binding) by SNPs mapped within CisOMs was significant 322 with a p-value of 0.00355. Delta-cMYC binding strength was plotted with the delta score 323 of the luciferase expression ( Fig 6C). However, there is no noticeable pattern with 324 cMYC data with most tested SNPs exhibiting no change in predicted binding affinity. To 325 further expand this statistical approach to in vivo data, SNPs within identified CisOMs of 326 P53 and cMYC in Raji and U2OS were used to identify likely target genes and obtain 327  Table). Expression from the homozygous alternative alleles was not 331 used for correlation due to a lack of normalized expression values for many of the 332 analyzed SNPs in GTEx portal. Delta expression (delta-score) and delta binding scores 333 were calculated and used for correlation coefficient analysis. An insignificant correlation 334 was observed between altered binding affinity of P53 and the expression of genes 335 associated with the SNPs within CisOMs obtained from our ChIP-seq data and GTEx 336 portal ( Fig 7A, S2 Fig). However, a significant correlation between delta-cMYC and 337 delta-score was obtained with a p-value of 0.0163 (Fig 7B). Repeating the same 338 correlation analysis using CisOM-SNPs that have zero delta-binding scores, we also 339  Table). There is no significant correlation between the delta expression scores and 346 delta binding scores for either p53 or cMYC when restricting the set of SNPs to those 347 located within a single binding site of p53 or cMYC respectively (Fig 8A and B, S3 Fig).  To help tackle these limitations, this study used a well-controlled and optimized in 361 vitro experiment to validate the importance of P53 and cMYC competitive binding and 362 inhibition within CisOM elements. We hypothesize that P53 and cMYC competitive 363 binding and inhibition at co-occupied regions are crucial for regulating expression level 364 of target genes, and that SNPs located within CisOMs have more substantial 365 deleterious effects on target genes than DNA variations in single binding sites. We Furthermore, gene ontology analysis suggested that the competitive inhibition at 400 CisOMs between P53 and cMYC is involved in controlling apoptosis, cell cycle, 401 response to DNA damage, and radiation in humans and mice. 402 To validate these findings in the same cells, the genomic binding of P53 and 403 cMYC was studied in two cancer cell lines, Raji and U2OS, expressing P53 and cMYC.

Protein analysis by western blot 492
Immunoblotting was performed on total protein extracts from untreated and treated 493 Raji and U2OS cells to detect the protein levels of P53 and cMYC. We used mouse anti-494 P53 (sc-126) and mouse anti-cMYC (sc-40) for the immunoblot. Odyssey Li-Cor system 495 was utilized to visualize the protein bands and intensity, and beta-actin was used as a 496 loading control. 497

RNA-Seq analysis was conducted on untreated and treated Raji and U2OS cells 499
with doxorubicin to identify differentially expressed genes after treatment. The total RNA 500 samples extracted from cancer cells were submitted to Genewiz company, which 501 performed an independent test on RNA quality and concentration. Illumina HiSeq2500 502 platform in a 2x100bp paired-end configuration was used to obtain 35 million reads on 503 average for each library sample, and the analysis of the DEGs was performed using the 504 Bioconductor edgeR package. Normalized read count by the total read count per sample 505 was converted to log counts per million (log CPM) metrics, and used for clustering and 506 generating a heat map using the clustered image maps (CIM) function in the mixOmics 507 R package. 508

509
The raw reads obtained from paired-end RNA Sequencing were mapped to human 510 reference genome hg38 using STAR [51] with default parameters to obtain count values. between the two half-sites. Thus, two half-sites of p53 will be considered as a site only if 538 the distance between them is less than 15 base pairs. The LR score for this combined  To compare the impact of SNPs within CisOMs to non-CisOMs, the normalized 580 expression for SNPs within single binding sites of p53 or cMYC was collected for SNPs 581 in chromosome 8:48007568-96260804 due to a large number of SNPs. The median 582 values of the normalized expression of genes associated with the SNPs were collected 583 from the common homozygous, heterozygous, and alternative homozygous alleles of 584 the analyzed SNPs. In addition, the normalized expressions were mainly obtained from 585 the whole human blood, skin, lungs, and breast tissues. Delta-scores and delta-p53 and 586 cMYC scores were calculated, and correlation coefficient analysis was performed using 587  illustrates how p53 and cMYC co-occupied regions from ChIP-seq data selected from 603 studies in human embryonic stem cells and murine embryonic stem cells were used to 604 map CisOMs of p53 and cMYC (A). CisOMs within co-occupied regions were identified 605 using a python script. GREAT was used to reveal the potential target genes regulated 606 by the CisOMs and USCS browser was used to identify SNPs within the CisOMs.