Introduction

Breast cancer is a multifactorial disease caused by genetic and environmental factors [1]. So far, genetic studies have identified four high-penetrance genes (BRCA1, BRCA2, TP53, and PTEN) related to breast cancer [2]. In addition, genetic variations including single-nucleotide polymorphisms (SNPs), small insertion–deletion polymorphisms, and variable numbers of repetitive sequences have been reportedly associated with breast cancer risk, comprising 51 variants in 40 genes graded as a strong relation for 10 variants in 6 genes (ATM, CASP8, CHEK2, CTL4, NBN, and TP53), moderate for 4 variants for 4 genes (ATM, CYP19A1, TERT, and XRCC3), and weak for 37 variants [3].

Another variation in the human genome is that of genomic structural variants including copy number variations (CNVs) [4]. The CNVs involve gains or losses of several to hundreds of kilobases of genomic DNA among phenotypically normal individuals, and at least 11,700 CNV regions larger than 443 bp have been identified [5]. CNVs have been shown to significantly influence messenger RNA expression levels [6, 7], and recent studies have described associations of CNVs with various common disorders [8] as well as with mental illness [9]. As examples, The Wellcome Trust Case Control Consortium identified three CNVs associated with common diseases: IRGM for Crohn’s disease; HLA for Crohn’s disease, rheumatoid arthritis, and type 1 diabetes; and TSPAN8 for type 2 diabetes [10]. In regard to neoplasms, CNVs have recently been reported as factors predisposing individuals to neuroblastoma, prostate cancer, pancreatic cancer, colorectal cancer, and BRCA1-associated ovarian cancer [6, 1115]. Although CNVs are expected to affect breast cancer risk, little is known about this association except for a previous report in which the proportion of rare CNVs was excessive in patients with hereditary breast cancer without BRCA1/BRCA2 mutations compared with controls [16]. These gaps, in our knowledge, prompted us to study this relation. Here, we report that CNVs significantly affect the susceptibility to breast cancer.

Materials and methods

The study protocol was approved by the institutional review board of Yamaguchi University Graduate School of Medicine, and informed consent was obtained from each patient.

Screening of CNVs by array comparative genomic hybridization

We obtained 30 DNA samples from the peripheral blood of women without a history of breast cancer and 30 DNA samples from the peripheral blood of patients with a history of breast cancer. A pool of blood-derived DNA from the 30 healthy women was used as a reference sample for all hybridizations performed. Assessment of the CNVs in the human genome by oligonucleotide array comparative genomic hybridization (CGH) (human CGH 2.1 M whole-genome tiling array; Roche NimbleGen) was performed according to the manufacturer’s protocol. Array image analysis and normalization were performed with NimbleScan version 2.5 software (Roche NimbleGen). The normalized data were then processed using Nexus Copy Number version 5.0 software (BioDiscovery).

Copy number validation by real-time polymerase chain reaction

Quantitative real-time polymerase chain reaction (PCR) using predesigned TaqMan® Copy Number Assays (Applied Biosystems) containing a primer pair and a FAM dye-labeled minor groove binder (MGB) probe was performed to detect the copy number of the genomic sequence of interest using a larger cohort. For the internal control, a predesigned TaqMan® Copy Number Reference Assay RNase P (Applied Biosystems), which is known to exist in two copies in a diploid genome, was used. We obtained 193 DNA samples from the peripheral blood of patients with a history of breast cancer and 170 DNA samples from age-matched women without a history of breast cancer. The mean age was 57.3 years in the patient group and 55.6 years in the control group. There was no statistical difference in age distribution between the groups. The calibrator sample for quantitative real-time PCR was the DNA pooled from 30 healthy women; the same was used as the reference in the array CGH assay, and the copy number of the calibrator sample was assumed to be 2. The 7900HT system and the StepOnePlus system (Applied Biosystems) were used for the quantitative real-time PCR analysis. The PCRs were carried out according to the manufacturer’s protocol.

TA cloning

To confirm the DNA sequence, a part of the real-time PCR products were gel purified and cloned into the T/A cloning vector pGEM-T Easy (Promega). At least five subclones were isolated and identified by direct sequencing.

Copy number validation by digital PCR

Digital PCR was available for six CNVs including Hs06535529_cn, Hs03899300_cn, Hs03908783_cn, Hs03898338_cn, Hs04090898_cn, and Hs040904315_cn to evaluate absolute copy numbers. Regarding Hs03103056_cn, digital PCR was not available because of difficulties in designing primers and probes for digital PCR. To evaluate the copy number of Hs03899300_cn, we designed forward and reverse primers and a TaqMan® MGB probe of Hs03899300_cn region and hTERT. hTERT was used as the internal control because it is known to exist in two copies in a diploid genome [17]. The primers were 5′-TGCCTGGCACTAAGGTTTAGAGTT-3′ (forward) and 5′-CACTCAGAGGGTTAAGTGAAGTGACA-3′ (reverse) for the Hs03899300_cn region and 5′-GGGTCCTCGCCTGTGTACAG-3′ (forward) and 5′-CCTGGGAGCTCTGGGAATTT-3′ (reverse) for hTERT. The probes were 5′-FAM-TGAGTCGGTGCTTCC-MGB-3′ for the Hs03899300_cn region and 5′-VIC-CACACCTTTGGTCACTC-MGB-3′ for hTERT. We designed these primers and probes to avoid SNPs. Regarding other CNVs, the same Copy Number Assays used in the real-time PCR were available. Reaction mixtures of 20-μL volume comprising 1× ddPCR Master Mix (Bio-Rad), forward and reverse primers and probes for a target and a reference, and DNA were prepared. PCR amplification was performed for a total of 40 cycles with an annealing temperature of 58 °C. Digital PCR was carried out using a QX100 droplet digital PCR system (BioRad) according to the manufacturer’s protocol [18].

Statistical analysis

A Fisher’s exact test, an unpaired t test, a Mann–Whitney test, linear regression analysis, and linear discriminant analysis were used to compare variables. A P value of <0.05 was considered to be significant. Data were analyzed with GraphPad Prism version 4.03, GraphPad InStat version 3.10 (GraphPad Software), and Ekuseru-Toukei 2008 (Social Survey Research Information).

Results

Using array CGH, we found four CNV regions with significant differences in the frequency of copy number changes between the patient group and the control group. The CNV positions were chr1:21,500,972-21,505,481; chr3:162,215,705-162,235,598; chr15:102,029,706-102,034,387; and chr22:37,142,958-37,147,755 (GRCh37/hg19). The CNVs detected by array CGH, however, could be false positives because a poor signal-to-noise ratio of hybridizations leads to considerable variation in the reported CGH ratio [19], and smaller CNVs are much more likely to be false positives than are large CNVs [20]. Therefore, quantitative real-time PCR with a larger cohort was carried out to confirm the CNVs associated with breast cancer susceptibility. We identified seven CNV markers related to breast cancer risk as shown in Table 1. The means of the relative copy numbers of patients with a history of breast cancer and those of women in the control group were 0.8 and 1.8 for Hs06535529_cn on 1p36.12 (P < 0.0001), 2.9 and 2.2 for Hs03103056_cn on 3q26.1 (P < 0.0001), 1.2 and 1.8 for Hs03899300_cn on 15q26.3 (P < 0.0001), 1.0 and 1.5 for Hs03908783_cn on 15q26.3 (P < 0.0001), and 1.1 and 1.7 for Hs03898338_cn on 15q26.3 (P < 0.0001), respectively (Fig. 1). The copy number of the Hs03899300_cn region on 15q26.3 by digital PCR was consistent with that by real-time PCR (Fig. 2), and the decision coefficient (r 2) was 0.9801. Also, copy numbers of other CNVs by digital PCR and by real-time PCR were well correlated: r 2 was 0.9201 for Hs06535529_cn, 0.8450 for Hs03908783_cn, 0.8909 for Hs03898338_cn, 0.9958 for Hs04090898_cn, and 0.9491 for Hs04093415_cn. Interestingly, nine or more copies of Hs04093415_cn on 22q12.3 were found only in eight (4.1 %) patients with a history of breast cancer and in none of the controls (P = 0.0081, Fig. 1 and Table 2). Similarly, 12 or more copies of Hs04090898_cn on 22q12.3 were found only in 7 (3.6 %) patients with a history of breast cancer and in none of the controls (P = 0.0160, Fig. 1 and Table 2). After setting a copy number threshold, we evaluated the relation between the copy number events and breast cancer susceptibility. The sensitivity and specificity were 83.9 and 41.2 % for Hs06535529_cn, 39.4 and 90.0 % for Hs03103056_cn, 76.7 and 70.0 % for Hs03899300_cn, 79.8 and 45.3 % for Hs03908783_cn, 83.4 and 65.9 % for Hs03898338_cn, 4.1 and 100.0 % for Hs04093415_cn, and 3.6 and 100.0 % for Hs04090898_cn (Table 2). Linear discriminant analysis with combination of two CNVs resulted in 80.3 % sensitivity, 80.6 % specificity, 82.4 % positive predictive value, and 78.3 % negative predictive value for the prediction of breast cancer susceptibility. The discriminant score was calculated as follows: Y = −6.9X1 + 3.2X2 + 6.1, where X 1 = the copy number of Hs03899300_cn and X 2 = the copy number of Hs03908783_cn.

Table 1 CNV markers related to breast cancer risk
Fig. 1
figure 1

Distribution of copy numbers in patients with a history of breast cancer and in women in the control group. Each sample is indicated by an open circle. The horizontal lines represent the mean copy number in each group

Fig. 2
figure 2

Comparison of Hs03899300_cn copy number between real-time PCR and digital PCR evaluation. Dark and light gray bars represent the copy numbers evaluated by real-time PCR and by digital PCR, respectively

Table 2 Relation between CNVs and breast cancer susceptibility

Discussion

In the current study, we identified CNV loci associated with breast cancer susceptibility. Our results, however, contrast with the study of Craddock et al. [10], who reported that there was no association between CNVs and breast cancer risk. This discrepancy is likely caused by the differences in the array-CGH platforms and analytic tools used. Different calling algorithms in the analytic tools give substantially a different quantity and quality of CNV calls even when identical raw data are used as the input [21]. Differences in preprocessing, labeling, and hybridization protocols, which were performed according to the various manufacturers’ specifications, could contribute to the occurrence of false-negative and false-positive calls [22]. Therefore, comparison of data sets resulting from different platforms and/or different analytic tools will cause problems in association analysis and can create false association signals [21]. To evaluate a copy number exactly, it is necessary to follow a validation study using a different methodology such as that of real-time PCR [22].

In the current study, we found that the copy numbers of Hs03899300_cn, Hs03908783_cn, and Hs03898338_cn, which are located close to each other on 15q26.3, were similar by real-time PCR. These findings were also observed between Hs04093415_cn and Hs040908898_cn on 22q12.3. Furthermore, the copy number of six CNVs including Hs06535529_cn, Hs03899300_cn, Hs03908783_cn, Hs03898338_cn, Hs04090898_cn, and Hs040904315_cn evaluated by digital PCR confirmed the accuracy of the data from the real-time PCR. Thus, false positives and negatives from the real-time PCR could be excluded. To our knowledge, this is the first report to show a distinct relation between CNVs and breast cancer risk.

Interestingly, 9 or more copies of Hs04093415_cn and 12 or more copies of Hs040908898 were observed only in patients with a history of breast cancer, and odds ratios for breast cancer susceptibility were 19.8 and 17.4, respectively. Such high odds ratios suggest strong oncogenic effect in these regions. Because mutations of high-penetrance genes for breast cancer (BRCA1, BRCA2, TP53, and PTEN) have not been tested, and familial history was not available in the present study, further studies are required to elucidate the association of the CNVs and hereditary breast cancer syndromes.

In the current study, some of the CNV regions related to breast cancer susceptibility contained genes such as EIF4G3 and PCSK6. Eukaryotic initiation factor 4 gamma 3 (EIF4G3) is a protein critical for initiation of protein translation [23]. To date, no relation of EIF4G3 with cancer development has been reported. We hypothesize that the decrease in the germline copy number of EIF4G3 may lead to a reduction or failure in translation of some transcripts and possibly give malignant potential to cells. Further examination will be required to elucidate this speculation. Proprotein convertase subtilisin/kexin type 6 (PCSK6) is a member of the protease family of proprotein convertases that activate precursor proteins by cleaving at the specific recognition sequence RXK/RR [24]. The relation between PCSK6 expression and carcinogenesis is controversial. Some investigations reported that overexpression of PCSK6 in immortalized nontumorigenic or papilloma-derived keratinocytes increased their invasiveness [25], whereas other studies linked absent or reduced PCSK6 expression levels to ovarian cancer [26]. Regarding breast cancer, overexpression of prosegment ppPCSK6 resulted in significant enhancement in cell motility, migration, and invasion of collagen in vitro [27]. However, because the effect of the reduced copy number of PCSK6 on normal mammary gland cells has not yet been investigated, further examination will be required to understand the function of PCSK6 in the neoplastic process.

The fact that no genes were mapped to the rest of the CNV regions raises a question as to how such CNVs affect breast cancer development. A possible explanation is that new gene transcripts may exist within the CNVs. Indeed, Diskin et al. found a new gene transcript related to neuroblastoma within the 1q21.1 CNV region where no known genes had been mapped [6]. Another hypothesis is that noncoding RNAs may be involved, such as long intergenic noncoding RNAs that regulate chromatin states and epigenetic inheritance, but knowledge of the molecular mechanisms of their function are still lacking [28]. Because the function of the CNVs is still unknown, further examinations will be required.

In summary, we found several unique CNVs associated with breast cancer. These CNVs may be feasible markers for assessment of the risk of breast cancer. However, as we cannot exclude the possibility that some women without a history of breast cancer may develop breast cancer in the future because the lifetime risk of developing breast cancer in Japan is 6 % [29], confirmatory studies using independent data sets are needed to support our findings.