Markers of BRCAness in breast cancer

10 Background: Mutations in BRCA1 and BRCA2 cause deficiencies in homologous recombination repair (HR), 11 resulting in repair of DNA double-strand breaks by the alternative non-homologous end-joining pathway, 12 which is more error prone. HR deficiency of breast tumors is important because it is associated with better 13 response to platinum salt therapies and to PARP inhibitors. Among other consequences of HR deficiency are 14 characteristic somatic-mutation signatures and transcriptomic patterns. The term “BRCAness” describes 15 tumors that harbor an HR defect but have no detectable germline mutation in BRCA1 or BRCA2. A better 16 understanding of the genes and molecular aberrations associated with BRCAness could provide mechanistic 17 insights and guide development of targeted treatments. 18 Methods: Using The Cancer Genome Atlas (TCGA) genomic data from breast cancers in 1101 patients, we 19 identified tumors with BRCAness based on somatic mutations, homozygous deletions, and hypermethylation 20 of BRCA1 and BRCA2. We then evaluated germline mutations, somatic mutations, homozygous deletions, 21 and hypermethylation of 24 other breast-cancer predisposition genes. Using somatic-mutation signatures, we 22 compared these groups against tumors from 44 TCGA patients with germline mutations in BRCA1 or 23 BRCA2. We also compared gene-expression profiles of tumors with BRCAness versus tumors from BRCA1 24 and BRCA2 mutation carriers. A statistical resampling approach enabled objective quantification of 25 similarities among tumors, and dimensionality reduction enabled graphical characterizations of these 26 relationships. 27 Results: Somatic-mutation signatures of tumors having a BRCA1/BRCA2 somatic mutation, homozygous 28 deletion, or hypermethylation (n = 64) were markedly similar to each other and to tumors from 29 BRCA1/BRCA2 germline carriers (n = 44). Furthermore, somatic-mutation signatures of tumors with 30 germline or somatic events in BARD1 or RAD51C showed high similarity to tumors from BRCA1/BRCA2 31 carriers. These findings coincide with the roles of these genes in HR and support their candidacy as genes 32 critical to BRCAness. As expected, tumors with either germline or somatic events in BRCA1 were enriched 33 for basal gene-expression features. 34 Conclusions: Somatic-mutation signatures reflect the effects of HR deficiencies in breast tumors. 35 Somatic-mutation signatures have potential as biomarkers of treatment response and to decipher the 36 mechanisms of HR deficiency. 37


39
Approximately 1-5% of breast-cancer patients carry a pathogenic germline variant in either BRCA1 or 40 BRCA2 1-5 . These genes play important roles in homologous recombination repair (HR) of double-stranded 41 breaks and stalled or damaged replication forks 6,7 . When the BRCA1 or BRCA2 gene products are unable to 42 perform HR, cells may resort to non-homologous end-joining, a less effective means of repairing 43 double-stranded breaks, potentially leading to an increased rate of DNA mutations [8][9][10][11] . Patients who carry 44 biallelic loss of BRCA1 and BRCA2 due to germline variants and/or somatic mutations often respond well to 45 poly ADP ribose polymerase (PARP) inhibitors and platinum-salt therapies, which increase the rate of DNA  An underlying assumption of the BRCAness concept is that the effects of HR deficiency are similar across 67 tumors, regardless of the genes that drive those deficiencies and despite considerable variation in genetic 68 2 backgrounds, environmental factors, and the presence of other driver mutations. Based on this 69 assumption-and in a quest to identify candidate markers of BRCAness-we performed a systematic 70 evaluation of multiomic and clinical data from 1101 patients in The Cancer Genome Atlas (TCGA) 24 . In 71 performing these evaluations, we characterized each tumor using two types of molecular signature: 1) 72 weights that represent the tumor's somatic-mutation profile and 2) mRNA expression values for genes used to 73 assign tumors to the PAM50 subtypes 36,37 . In this way, we sought to characterize the effects of HR defects in 74 a comprehensive yet clinically interpretable manner. To evaluate similarities among tumors based on these 75 molecular profiles, we used a statistical-resampling approach designed to quantify similarities among patient 76 subgroups, even when those subgroups are small, thus helping to account for rare events. We use aberration 77 as a general term to describe germline mutations, somatic mutations, copy-number deletions, and 78 hypermethylation events.

80
Data preparation and filtering 81 We obtained breast-cancer data from TCGA for 1101 patients in total. To determine germline-mutation 82 status, we downloaded raw sequencing data from CGHub 38 for normal (blood) samples. We limited our 83 analysis to whole-exome sequencing samples that had been sequenced using Illumina Genome Analyzer or To call DNA variants, we used freebayes (version v0.9.21-18-gc15a283) 45 and Pindel 99 (https://github.com/genome/pindel). We used freebayes to identify single-nucleotide variants (SNVs) and 100 small insertions or deletions (indels); we used Pindel to identify medium-sized insertions and deletions.

101
Having called these variants, we used snpEff (version 4.1) 46 to annotate the variants and GEMINI (version  We identified somatic SNVs and indels for each patient by examining variant calls that had been made using 113 Mutect 52 ; these variants had been made available via the Genomic Data Commons 53 . We used the following 114 criteria to exclude somatic variants: 1) synonymous variants 2) variants that snpEff classified as having a 115 "LOW" or "MODIFIER" effect on protein sequence, 3) variants that SIFT 54 and Polyphen2 55 both suggested 116 to be benign 56 , and 4) variants that were observed at greater than 1% frequency across all populations in 117 ExAC 57 . For BRCA1 and BRCA2, we examined candidate variants based on all available sources of evidence 118 and the University of Washington, Department of Laboratory Medicine clinical database as described 119 previously 58 . We compared our classifications to those publicly reported in the ClinVar database 59 when 120 available and found complete concordance. Based on these criteria, we categorized each variant as 121 pathogenic, likely pathogenic, variant of uncertain significance (VUS), likely benign, or benign. Then we 122 examined the ClinVar database 60 for evidence that VUS or likely benign variants had been classified by 123 others as pathogenic; however, none met this criterion. To err on the side of sensitivity, we considered any 124 BRCA1 and BRCA2 mutation to be "mutated" if it fell into our pathogenic, likely pathogenic, or VUS 125 categories.

126
Using the somatic-mutation data for each patient, we derived mutation-signature profiles using the 127 deconstructSigs (version 1.8.0) R package 61 . As input to this process, we used somatic-variant calls that had 128 not been filtered for pathogenicity, as a way to ensure adequate representation of each signature. The output 129 of this process was a vector for each tumor that indicated a "weight" for each signature 19 . Figures S1-S2 130 illustrate these weights for two tumors that we analyzed. 131 We downloaded DNA methylation data via the Xena Functional Genomics Explorer 62 . These data were 132 generated using the Illumina HumanMethylation27 and HumanMethylation450 BeadChip platforms. For the 133 HumanMethylation27 arrays, we mapped probes to genes using a file provided by the manufacturer 134 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL8490). For the HumanMethylation450 arrays, we 135 mapped probes to genes using an annotation file created by Price, et al. 63 (see 136 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL16304). Typically, multiple probes mapped to a 137 given gene. Using probe-level data from BRCA1, BRCA2, PTEN, and RAD51C, we performed a preliminary 138 analysis to determine criteria for selecting and summarizing these probe-level values. Because these genes 139 are tumor suppressors, we started with the assumption that in most cases, the genes would be methylated at 140 low levels. We also assumed that probes nearest the transcription start sites would be most informative. Upon 141 plotting the data ( Figure S3), we decided to limit our analysis to probes that mapped to the genome within 142 300 nucleotides of each gene's transcription start site. In some cases, probes appeared to be faulty because 143 they showed considerably different methylation levels ("beta" values) than other probes in the region ( Figure   144 S3). To mitigate the effects of these outliers, we calculated gene-level methylation values as the median beta 145 value across any remaining probes for that gene. Then, to identify tumors that exhibited relatively high beta 146 values-and thus could be considered to be hypermethylated-we used the getOutliersII function in the 147 extremevalues R package (version 2.3.2) 64 to detect outliers. When invoking this function, we specified the 148 following non-default parameter values: distribution = "exponential", alpha = c(0.000001, 149 0.000001). 150 We downloaded copy-number-variation data from the Xena Functional Genomics Explorer 62 . These data had 151 been generated using Affymetrix SNP 6.0 arrays; CNV calls had been made using the GISTIC2 method 65 .

152
The CNV calls had also been summarized to gene-level values using integer-based discretization. We 153 focused on tumors with a gene count of "-2", which indicates a homozygous deletion. 154 We used RNA-Sequencing data that had been aligned and summarized to gene-level values using the original 155 TCGA pipeline 24 . To facilitate biological and clinical interpretation, we limited the gene-expression data to 156 The Prosigna™ Breast Cancer Prognostic Gene Signature (PAM50) genes 66 . Netanely, et al. had previously published PAM50 subtypes for TCGA breast cancer samples; we reused this information in our study 67 . For 158 each of these genes, we also sought to identify tumors with unusually low expression levels. To do this, we 159 used the getOutliersI function in the extremevalues package to identify outliers. We used the following To prepare, analyze, and visualize the data, we wrote computer scripts in the R programming language 70 . In 168 writing these scripts, we used the following packages: readr 71   To reduce data dimensionality for visualization purposes, we applied the Barnes-Hut t-distributed Stochastic

173
Neighbor Embedding (t-SNE) algorithm 83,84 to the mutation signatures and PAM50 expression profiles. This 174 reduced the data to two dimensions, which we plotted as Cartesian coordinates. To quantify homogeneity 175 within a group of tumors that harbored a particular aberration, we calculated the pairwise Euclidean distance 176 between each patient pair in the group and then calculated the median pairwise distance 85 . When comparing 177 two groups, we used a similar approach but instead calculated the median distance between each pair of 178 individuals in either group. To determine whether the similarity within or between groups was statistically 179 significant, we used a permutation approach. We randomized the patient identifiers, calculated the median 180 pairwise distance within (or between) groups, and repeated these steps 10,000 times. This process resulted in 181 an empirical null distribution against which we compared the actual median distance. We then derived 182 empirical p-values by calculating the proportion of randomized median distances that were larger than the 183 actual median distance.

185
We used clinical and molecular data from breast-cancer patients in TCGA to evaluate the downstream effects 186 of BRCA1 and BRCA2 germline mutations. We evaluated two types of downstream effect: 1) expression 187 levels of genes that are used to classify tumors into the PAM50 subtypes 36,37 and 2) signatures that reflect a 188 tumor's overall somatic-mutation profile in a trinucleotide context 18,19 . We used expression data for the 189 PAM50 genes due to their biological and clinical relevance. We used somatic-mutation signatures because for most BRCA1 and BRCA2 carriers was "Signature 3"; however, other signatures (especially 1A) were also 205 common ( Figure S7). Figure S8 shows the overlap between these two types of molecular profile.

206
Although it is useful to evaluate breast-cancer patients based on the primary subtype or signature associated 207 with each tumor, tumors are aggregates of multiple subtypes and signatures. To account for this diversity, we 208 characterized tumors based on 1) gene-expression levels for all available PAM50 genes and 2) all 27 209 somatic-mutation signatures. To enable visualization of these profiles, we used the t-SNE technique to reduce 210 the dimensionality of these profiles. Generally, tumors with the same primary subtype or signature clustered 211 together in these visualizations (Figures 2-3); however, in some cases, this did not happen. For example, the 212 dimensionally reduced gene-expression profiles for Basal tumors formed a tight, distinct cluster (Figure ??).

213
But some Basal tumors were distant from this cluster, and one "Normal-like" tumor was located in this 214 7 cluster. Similarly, tumors assigned to somatic-mutation "Signature 3" formed a cohesive cluster (Figure 3), 215 but some "Signature 3" tumors were separate. These observations highlight the importance of evaluating 216 molecular profiles as a whole, not just using a single, primary category. 217 Under the assumption that BRCA1/BRCA2 germline variants exhibit recognizable effects on tumor  Figures 5B and S10B). None of the three BRCA1 carriers who lacked 228 LOH events clustered closely with the remaining BRCA1 tumors ( Figure 5A). Of the 7 BRCA2 tumors 229 without detected LOH events, 4 were among those that failed to cluster closely with the remaining BRCA2 230 tumors ( Figure 5B). These observations confirm that germline BRCA1/BRCA2 mutations leave a 231 recognizable imprint on a tumor's mutational landscape but that this imprint is more likely in combination 232 with a second "hit" in the same gene 19,32,86 .

233
Next we evaluated similarities between BRCA1 and BRCA2 germline carriers. Although some BRCA2 234 carriers fell into the Basal gene-expression subtype, overall profiles for these patients were dissimilar to those 235 from BRCA1 carriers (p = 0.99; Figures 4A-B and S11A). However, the opposite held true for 236 somatic-mutation signatures: tumors from BRCA1 and BRCA2 carriers were highly similar to each other (p = 237 0.0001; Figures 5A-B and S12A).

238
A somatic mutation, homozygous deletion, or DNA hypermethylation occurred in BRCA1 and BRCA2 for 64 239 patients ( Figure 1B-D). Most of these events were mutually exclusive with each other and with germline 240 variants ( Figure S13). Whether for PAM50 subtypes or somatic-mutation signatures, tumors with BRCA1 241 hypermethylation were relatively homogeneous and highly similar to tumors from BRCA1 germline carriers 242 ( Figures 4G, 5G, S9G, S10G; Table 1). For PAM50 gene expression, no other aberration type showed 243 significant similarity to BRCA1 germline mutations. Somatic-mutation signatures from tumors with BRCA1 244 somatic mutations or homozygous deletions were significantly similar to those from BRCA1 germline 245 mutations (Table 1). Only 2 tumors had BRCA2 hypermethylation, but the mutational signatures for these 246 samples were significantly similar to tumors from BRCA2 germline carriers (p = 0.0014; Figure 5H).

247
Likewise, BRCA2 somatic mutations and homozygous deletions produced mutational signatures that were 248 similar to germline BRCA2 carriers (Table 1; Figures 5D and 5F). Based on these findings, we conclude that 249 disruptions of BRCA1 and BRCA2 exert similar effects on somatic-mutation signatures-but not PAM50 250 gene expression-whether those disruptions originate in the germline or via somatic processes. To provide 251 further evidence, we aggregated all patients who had any type of BRCA1 or BRCA2 aberration into a 252 BRCAness reference group. As a whole, mutational signatures for this group were much more homogeneous 253 than expected by chance (p = 0.0001; Figure S14). We used this reference group to evaluate other criteria 254 that might classify patients into the BRCAness category. For our remaining evaluations, we used 255 somatic-mutation signatures-rather than PAM50 gene expression-for these assessments because they 256 coincided so consistently with BRCA aberration status, in line with the definition of BRCAness as an HR 257 defect 30 . 258 We examined data for 24 additional breast-cancer predisposition genes and evaluated whether molecular 259 aberrations in these genes result in mutational signatures that are similar to our BRCAness reference group. 260 We found pathogenic and likely pathogenic germline mutations in 15 genes. The most frequently mutated 261 were CHEK2, ATM, and NBN (Figures S15 and S16). We found potentially pathogenic somatic mutations in 262 all 24 genes, most frequently in TP53, CDH1, and PTEN (Figures S17 and S18). Homozygous deletions 263 occurred most frequently in PTEN, CDH1, and CHEK1 ( Figures S19 and S20). Finally, 5 genes were 264 hypermethylated ( Figures S21 and S22). Typically, these events were rare for a given gene. Using our 265 resampling approach, we compared each aberration type in each gene against the BRCAness reference group.

303
Furthermore, somatic-mutations, homozygous deletions, and hypermethylation of BRCA1 and BRCA2 had 304 downstream effects similar to germline mutations in these genes. As a whole, tumors with any 305 BRCA1/BRCA2 aberration formed a cohesive group, against which we compared other tumors. For a gene to 306 be considered a strong BRCAness biomarker candidate, we required that at least two types of molecular 307 aberration show significant similarity to the BRCAness reference group, suggesting that aberrations in the 308 gene leave a recognizable imprint on the somatic-mutation landscape. This allowed us to derive insights even 309 though a single type of aberration may have occurred rarely in a given gene. Two genes met these criteria:  (Table 2). These included germline mutations in PALB2 and RAD51B, which have a clear mechanistic link to 317 BRCA1 and BRCA2. Determining which germline mutations are pathogenic remains a challenging task, so it 318 is possible that more-or less-stringent filtering of candidate aberrations would lead to more consistent results.

319
In addition, it is likely that mono-allelic inactivation of these and other genes may be insufficient to impair 320 HR function 51 . Tumors with homozygous deletions in TP53 were significantly similar to the BRCAness 321 groups; somatic mutations in this gene showed considerable overlap with the BRCAness tumors, but this 322 similarity did not reach statistical significane. TP53 has long been recognized as an important gene in breast 323 cancer, and mutations in this gene have been shown to associate with germline mutations in BRCA1 and 324 BRCA2 93,94 . However, because TP53 mutations occur frequently in breast cancer overall, they may be 325 sensitive but non-specific biomarkers of BRCAness. Perhaps TP53 aberrations act as secondary events that 326 compromise genomic integrity in combination with initiating events in the HR pathway.

327
Although the mutational-signature patterns we observed were highly consistent in many cases, it remains to  Our statistical-resampling approach uses Euclidean distances to evaluate similarity (see Methods). For 334 visualization, we used a two-dimensional representation of the same data. In most cases, these two methods   who harbored a specific type of aberration in a candidate BRCAness gene. We evaluated whether 390 somatic-mutation signatures from patients who harbored a given type of aberration (e.g., BARD1 germline 391 mutation) were more similar to the BRCAness reference group than expected by random chance. The 392 numbers in this table represent empirical p-values. In cases where no patient had a given type of aberration in 393 a given gene, we list "N/A". The "Any" group represents individuals who harbored any type of aberration in 394 a given gene.

395
Gene Germline mutation Somatic mutation Homozygous deletion Hypermethylation Any