Abstract
Extrachromosomal DNA plays an important role in oncogene amplification in tumour cells and poor outcomes across multiple cancers. However, the function of extrachromosomal DNA in gastric cardia adenocarcinoma (GCA) is very limited. Here, we investigated the availability and function of extrachromosomal DNA in GCA from a Chinese cohort of GCA using whole-genome sequencing (WGS), whole-exome sequencing (WES), and immunohistochemistry. For the first time, we identified the ecDNA amplicons present in most GCA patients, and found that some oncogenes are present as ecDNA amplicons in these patients. We found that oncogene ecDNA amplicons in the GCA cohort were associated with the chromothripsis process and may be induced by accumulated DNA damage due to local dietary habits in the geographic region. Strikingly, we observed diverse correlations between the presence of ecDNA oncogene amplicons and prognosis, where ERBB2 ecDNA amplicons correlated with good prognosis, EGFR ecDNA amplicons correlated with poor prognosis, and CCNE1 ecDNA amplicons did not correlate with prognosis. Large-scale ERBB2 immunohistochemistry results from 1668 GCA patients revealed that there was a positive correlation between the presence of ERBB2 and prognosis in 2-7-year survival; however, there was a negative correlation between the presence of ERBB2 and prognosis in 0-2-year survival. Our observations indicate that the presence of ERBB2 ecDNA in GCA patients may represent a good prognosis marker.
Introduction
Extrachromosomal DNA (ecDNA) was first identified more than half a century ago1, and has been associated with genomic instability2,3. With next-generation sequencing technologies and high throughput imaging platforms, an increasing number of studies have shown that ecDNAs are present in most tissues, and contribute to the intratumoral heterogeneity and cancer progression2,4-8. Using computational analysis of whole-genome sequencing (WGS) data from a large-scale cancer cohort, it has been demonstrated that the presence of ecDNA is cancer-type specific, and is associated with oncogene amplification and poor outcomes across multiple cancers7.
The cardia is located between the esophagus and the stomach. Gastric cardia adenocarcinoma (GCA) and esophageal squamous cell carcinoma (ESCC) occur together in the Taihang Mountains of north central China at high rates9-11. Gastric cancer in this area occurs primarily in the uppermost portion of the stomach and is referred to as GCA, and those in the remainder of the stomach are called gastric noncardia adenocarcinoma (GNCA)12. Adenocarcinomas from junction of esophagogastric junction are usually classified as Siewert type II of esophagogastric junction adenocarcinoma in western countries13-17, where Barrett’s esophagus is very common and has been considered as an important precancerous lesion of adenocarcinoma at esophagogastric junction18. However, GCA from a Chinese population in this area has distinct features compared to Western countries11,18,19, and very low frequency of Barrett’s esophagus is observed18. Instead, GCA in this area shares similar features with that of esophageal squamous cell carcinoma11,18. A previous study reported that oncogene amplification and gene rearrangements drive the progression and poor prognosis of GCA20. However, it is still unclear whether ecDNA is present in GCA, and what role it plays in the GCA progression or whether it is correlated with patient prognosis. Therefore, we investigated the availability and function of ecDNA in GCA in a Chinese cohort of GCA using whole-genome sequencing (WGS), whole-exome sequencing (WES), and immunohistochemistry, and explored the relationship between the presence of oncogene ecDNA amplicons and prognosis in GCA.
Results
Characterization of ecDNA amplicons in GCA
Since ecDNA can be identified from WGS data using amplification region reconstruction tool, AmpliconArchitect (AA)2,4-7,21, we first performed WGS of 36 pairs of GCA tumour and tumour-adjacent normal tissue from a high incidence GCA rate region in the northern region of China, Henan Province (see Methods). All of our WGS data in 36 pairs of samples had sufficient sequencing coverage and a high mapping rate (>95% mapping rate) (Supplementary Fig. 1a, Supplementary Table 1). In addition, we performed single-nucleotide variant (SNV) analysis in the 36 GCA patients and found that the top ranking mutated gene (81% mutation rate) was TP53 (Supplementary Fig. 1b), which agrees with previous gene mutation studies in GCA patients12,18,20,22. Then, we applied AA to these 36 pairs of whole genome sequencing (WGS) data pertaining to GCA tumor and tumor-adjacent normal tissue (Fig. 1a). Following the AA pipeline, we treated the tumour-adjacent normal tissue as the background to call the somatic copy number alteration(CNA) and identified ecDNA amplicons in our GCA cohort. Using this strategy, ecDNA amplicons were identified in 28 of 36 GCA patients(Fig. 1b), and the frequency (77.8%) of ecDNA amplicons observed in our GCA cohort is similar to that of esophageal cancer (∼80%) but higher than that of gastric cancer (∼50%) in a previous report7. Moreover, the number of ecDNAs identified from individual patients showed the high heterogeneity across the GCA cohort (Fig. 1b), with a range of ecDNA amplicons from 0 to 24. For most patients, the number of ecDNA ampliconswas less than 10, and only five patients had more than 10 ecDNA amplicons (Fig. 1b). In our GCA cohort, ecDNA amplicons were further classified into five categories7 (Fig. 1b, Supplementary Fig. 1c-e, Supplementary Table 2): Circular (n = 45), Complex (n = 21), Linear (n = 50), breakage-fusion-bridge (BFB) (n = 4) and Invalid (n = 31), which occurred heterogeneously across the GCA patient cohort (Fig. 1b). We further validated the circular feature of circular ecDNA amplicons identified from AA software using another in silico method, Circle-finder, which identifies circular DNA from paired-end high-throughput sequencing data23-25. By checking the sequencing read orientation and junction points of circular ecDNA using Circle-finder, we found that 89.94-100% of circular ecDNA amplicons identified from AA contained the same junctional reads detected by Circle-finder (Supplementary Fig. 1f-h). The high proportion of overlapping circular ecDNA amplicons from Circle-finder and AA results convinced us that the ecDNA amplicons identified with AA are reliable.
Next, we analyzed the size of ecDNA amplicons in our GCA cohort. The size of ecDNA amplicons from GCA ranged from 100 Kbp to 22.6 Mbp, with a median size of 350 Kb (Supplementary Fig. 2a), where 75% of ecDNA amplicons were between 1-2 Mbp, and only 1% of ecDNA amplicons were larger than 20 Mbp (Supplementary Fig. 2b). Some large ecDNA amplicons (> 20 Mbp) could be deconvoluted into multiple potential combinations of amplicons using AA software (Supplementary Figure 2c). Since deconvolution is performed using a computational prediction, there is still the possibility that multiple structures from these large ecDNA amplicons are independent from circular amplicons. We also investigated the frequency of ecDNA amplicons in different chromosomes. We found ecDNA amplicons of different lengths in all chromosomes (Supplementary Fig. 2d, 2e) and the number of ecDNA amplicons in the different chromosomes was independent of the length of the chromosome (Supplementary Fig. 2d). We concluded that ecDNA amplicons occur heterogeneously across GCA patients (Fig. 1b, Supplementary Fig. 2e).
Next, we performed genomic annotation for all ecDNA amplicons (Fig. 1c, Supplementary Fig. 2f, 2h). We found that ecDNA amplicons occurred in different parts of the genome, including 2452 sites in protein coding regions and 579 sites in long intergenic non-protein coding RNA (lincRNA) (Fig. 1c). However, the frequency of ecDNA amplicons observed in coding regions (6.28%) was higher than the proportion of coding regions in the whole genome (3.48%) (Supplementary Fig. 2f). Furthermore, the proportion of ecDNA amplicons detected in the exons (14.5%) is higher than that of exons in the entire genome (9.2%) (Supplementary Fig. 2g). These ecDNA amplicons are also identified at regions of small RNAs (Fig. 1c), including miRNAs (302 sites), SnRNAs (130 sites), SnoRNAs (63 sites), and rRNAs (37 sites). Interestingly, we found that 82 ecDNA amplicons were from oncogenes and tumor suppressor genes (TSGs) (Fig. 1c). Next, we focused on the analysis of oncogene and TSG ecDNA amplicons in our GCA cohort (Fig. 1d). The oncogene and TSG ecDNA amplicons across the GCA cohort exhibited a high heterogeneity, and the number of such oncogene and TSG ecDNA amplicons varied from 1 to 11 (Fig. 1d, 1e). Amplification of the cyclin-E1 (CCNE1) in the GCA was observed in a previous report26. Specifically, we found that CCNE1 ecDNA amplicons occurred in 11 patients in our cohort (Fig. 1d). ERBB2 is a member of the human epidermal growth factor receptor (EGF family), and it has been reported that ERBB2 amplification plays an important role in GCA progression26. We found that four patients had ERBB2 ecDNA amplicons (Fig. 1d). The, CDK12, EGFR and MYC, oncogenes and TSGs were also found in the ecDNA format in more than three patients in the cohort (Fig. 1d). The other name for ERBB2 is HER2, and EGFR is also called HER1 or ERBB127. Both HER1 and HER2 are members of the EGF family. The identification of HER1 ecDNA and HER2 ecDNA in GCA reflects the role of the EGF family in GCA progression28. However, we did not observe codetection of HER1 ecDNA amplicons and HER2 ecDNA amplicons in the same GCA patient (Fig. 1d), which likely indicates the heterogeneous features in our GCA cohort. The frequent detection of ecDNA amplicons in The Cancer Genome Atlas (TCGA) reflects the presence of cancer specific oncogene ecDNA amplicons in each cancer type7, where the ecDNA amplicons from gastric cancer and esophagus cancer are investigated. Since the cardia is located at the junction of esophageal and stomach, we next investigated whether the list of ecDNA amplicons from GCA was similar to that of gastric cancer or esophageal cancer using the TCGA report. We found that GCA shares some common oncogene ecDNA amplicons with both gastric cancer and esophageal cancer including CCNE1, EGFR, and MYC (Supplementary Fig. 3). The top two ranking ecDNA amplicons, ERBB2 and CCNE1, were the same in both gastric cancer and GCAs. However, the top ranking list of oncogene ecDNA amplicons was different between esophageal cancer and GCAs (Supplementary Fig. 3), where CCND1 and EGFR were the top two ranking oncogene amplicons in the esophageal cancer. Our results indicate that the top oncogene ecDNA amplicons from GCAs is more similar to those from gastric cancer. In addition, we observed that several oncogenes and TSG ecDNA amplicons appear in the same GCA patient (Fig. 1d). The cyclization of oncogene ecDNA is highly amplified due to its rolling-circle replication mechanism, and the circular ecDNA could contain different oncogenes from different regions of the genome2. Thus, we examined whether these different oncogenes and TSGs in the same patient were located in the same ecDNA amplicon. We first divided the highly amplified regions into segments, recombined them together by read orientation and read junctions, and further reconstructed circular ecDNA containing multiple oncogenes and/or TSG ecDNA amplicons (Fig. 1d-f, Supplementary Fig. 4a-d, Supplementary Table 3). We referred multiple (two or more than two) oncogenes and/or TSGs in the same ecDNA amplicon as oncogene ecDNA co-amplification (Fig. 1d), and investigated the frequency of such occurrences (Fig. 1d, 1e). We found i) co-amplification of oncogenes occurred in 50% of patients (18 of 36 patients) (Fig. 1e, Supplementary Fig. 4a); ii) the frequency of oncogene ecDNA co-amplification varied from 50% to 100% of all oncogene amplifications in different patients (Fig. 1e); and iii) some pairs of oncogene ecDNA co-amplifications were observed in more than one patient (Supplementary Table 3), where oncogene and TSG ecDNA pairs of ERBB2 and CDK12, RARA and SMARCE1, and CBLC and BCL3 occurred in 3 patients; oncogene ecDNA pairs of EGFR and IRF4, PPARG and RAF1; and pairs of CDK12, ERBB2 and RARA occurred in 2 patients. Interestingly, EGFR and CDK6 with a physical distance of 40 Mbp, are located in the same circular ecDNA (Fig. 1f). Using the normal genome copy number as the background, we found that the EGFR and CDK6 circular ecDNAs were amplified forty times compared to other parts of the genome (Fig. 1f). The coamplification of EGFR and CDK6 in the same ecDNA amplicons indicates that different genes could work together during the progression of GCAs.
Validation of ecDNA amplicons using Circle-Seq
To further evaluate the accuracy of ecDNA amplicon prediction from the AmpliconArchitect prediction, we chose 10 pairs of GCAs from our cohort to perform ecDNA sequencing with Circle-seq29 (See Methods, Supplementary Fig. 5a). We performed ecDNA peak calling from Circle-seq using adjacent normal tissue as the control30. Among 10 pairs of these selected GCA patients for Circle-Seq, seven of them were ecDNA amplicon positive by WGS prediction (Fig. 1b), and ecDNA amplicons (ranging from 491 to 39020) were identified in all of them using Circle-Seq (Supplementary Fig. 5b). Then, we checked the overlapping ecDNA segments from Circle-seq and predicated ecDNA amplicons from the WGS in the seven pairs of GCAs. We found that most ecDNA amplicons identified in the WGS appeared in the Circle-seq peak, where 100% WGS ecDNA in four GCAs, more than 80% WGS ecDNA in two GCAs, and 50% WGS ecDNA in one GCA were confirmed by Circle-seq (Fig. 2a). Since CCNE1 was the most dominant detected ecDNA amplicon across the cohort, we determined the detailed structure of CCNE1 in Circle-seq (Supplementary Fig. 5c). We found that there was a clear enrichment of CCNE1 in two GCAs from both Circle-seq and WGS, and that both had a similar tendency for amplification (Supplementary Fig. 5c). However, there was no CCNE amplification in the normal samples, in either WGS or Circle-seq, indicating that our ecDNA amplicon detection, identified with AmpliconArchitect prediction from the WGS data, is reliable. The AA computational tool not only predicted the ecDNA amplicon, but also provided the structure of the ecDNA amplicon. Upon closer inspection comparing the fine structure of ecDNA amplification between the WGS and Circle-seq, we found that the fine structure was not always the same (Fig. 2b). The FGFR2 ecDNA amplicon exhibited highly amplified segments with fluctuations in WGS prediction but not in the Circle-seq detection (Fig. 2b). The difference in the fine structure from WGS and Circle-seq likely reflects the technical bias of the ecDNA amplicon prediction from the WGS and library preparation from the Circle-seq.
EcDNA amplicons in GCA is associated with chromothripsis
Even though ecDNA amplicons are widely detected in different types of cancer, the sources of ecDNA amplification remain unknown. It has been reported that chromothripsis contributes to cancer progression and drives ecDNA amplification in cancer3,31,32, and that some ecDNA amplicons are generated during chromothripsis process2. Next, we aimed to understand the relationship between chromothripsis and ecDNA amplicons in our GCA cohort. We used the ShatterSeek package33 to identify chromothripsis events across the 36 GCA patients (Supplementary Fig. 6a). Strikingly, we found that chromothripsis occurred in 34 GCA patients across our cohort (Supplementary Fig. 6b). We also divided the chromothripsis events into fine categories with the parameters of high confidence (HC) and low confidence (LC) (see Methods). This revealed that HC chromothripsis occurred in 61.1% of GCAs across the cohort, and LC chromothripsis occurred in 88.9% of all GCA samples. We found that the frequency of chromothripsis in GCA patients was quite diverse across the cohort, where the range of chromothripsis was from 0 to 4 for HCs and 0 to 14 for LCs (Supplementary Fig. 6c). The location of the chromothripsis events in the genome was also quite heterogeneous across the cohort (Fig. 3a). When we aligned chromothripsis events and ecDNA amplicons on the genome browser, we observed a clear overlap between ecDNA amplicons and chromothripsis at some of the oncogene ecDNA loci, including the ERBB2 and MYC genes (Fig. 3b, Supplementary Fig. 7). To further explore the relationship between chromothripsis and ecDNA amplification, we quantified the number of ecDNA amplicons that overlapped with chromothripsis (Fig. 3c). The results showed that 17.22% of ecDNA amplicons occurred in HC chromothripsis, and 15.89% occurred in LC chromothripsis. Taken together, these results indicate that 33.11% of ecDNA amplicons might be caused by chromothripsis (Fig. 3c). To further determine the relationship between ecDNA amplicon and chromothripsis, we calculated the correlation between the number of chromothripsis events and the total length of all ecDNA (Fig. 3d). The results clearly demonstrated a positive correlation between ecDNA amplicons and chromothripsis events (Pearson’s correlation = 0.42). Our results indicate the ecDNA amplicons in GCAs are more likely to occur due to chromothripsis, and that such events could contribute to GCA progression if the chromothripsis event occurs at the oncogene site.
Comprehensive analysis of chromothripsis using large-scale samples of human cancers from TCGA showed that the frequency of chromothripsis is greater than 50% in several cancer types34. However, the frequency of chromothripsis in our GCA cohort was 94% (Fig. 3a), which is extremely high. Previous reports have shown that chromothripsis is associated with genomic instability and DNA damage35-39 Thus, we investigated potential risk factors contributing to such a high frequency of chromothripsis in our GCA cohort by analyzing genome stability and DNA damage. First, we performed microsatellite instability (MSI) detection by immunohistochemistry (IHC) staining of four proteins (MLH1, MSH2, MSH6 and PMS2)40,41. We found that only 9 of 36 samples were MSI-high samples (Supplementary Fig. 8a, 8b, Supplementary Table 4), and 27 patients were MSI-low. The two chromothripsis-negative samples were all in the MSI-low group (Supplementary Fig. 8b), and there was no correlation between MSI grade and chromothripsis events (Supplementary Fig. 8b, p = 1, Fisher’s exact test). Thus, we concluded that the high frequency of chromothripsis is not likely due to the high proportion of MSI-high samples in our cohort. Second, we calculated chromosomal instability (CIN) for all 36 samples in accordance with a previous report42 and divided GCA patients into four groups based on the genome integrity index (from low to high: 0 to 0.2, 0.2 to 0.4, 0.4 to 0.6, 0.6-0.8) (see Methods). We found only 2 samples in our GCA patients in the high-grade CIN group (Supplementary Fig. 8c, Supplementary Table 4). The two chromothripsis-negative samples were in the low-grade CIN group (Supplementary Fig. 8c), and there was no correlation between CIN grade and chromothripsis events (Supplementary Fig. 8c, p = 0.381, Fisher’s exact test). Thus, we concluded that the high frequency of chromothripsis is not likely due to the high proportion of high-grade CIN in our cohort. Third, we performed IHC staining of γH2AX protein, a crucial biomarker for the detection of DNA double strand breaks43, in our GCA cohort. We found that 80.55% (29/36) of GCA patients were γH2AX protein positive (Fig. 3e, 3f, Supplementary Table 4). The two chromothripsis-negative samples were both γH2AX protein negative (Fig. 3f), and there was a significant correlation between the presence of γH2AX and chromothripsis events (Fig. 3f, p = 0.033, Fisher’s exact test). We also found that the total length of chromothripsis in γH2AX protein-positive patients was significantly longer than that in γH2AX protein-negative patients (Fig. 3g, p = 0.025). Thus, we suspect that the high frequency of chromothripsis is most likely due to the high degree of DNA damage that has accumulated in GCA patients. All GCA patients in our study were from the high incidence area for GCA in Henan Province, northern China9, where the intake of nitrosamine-rich foods, such as pickled vegetables, has been well recognized as one of the key risk factors for GCA44. Accumulating evidence has demonstrated that nitrosamine is a very important factor for DNA alkylation, synthesis disorder, high instability and even DNA double strand breaks45-50. Thus, we suspected that nitrosamine exposure in our GCA cohort may accumulate DNA damage, potentially inducing a high frequency of chromothripsis. As ecDNA amplicons in our GCA cohort are more likely to occur due to chromothripsis, as stated above, and it was also proposed that chromothripsis is a primary mechanism that accelerates genomic DNA rearrangement and amplification into ecDNA by a recent study3, our data suggest that local dietary habits from the geographic region in our cohort may contribute to ecDNA occurrence in GCA patients.
The presence of oncogene ecDNA does not increase the mutation frequency in GCA
Oncogene amplification is a key factor contributing to human cancer51. A high frequency of oncogene mutations has also been reported in GCA20,22. Since both oncogene amplification (Fig. 1d) and oncogene mutations (Supplementary Fig. 1b) were observed in our GCA cohort, we investigated whether there was a high frequency of oncogene mutations in the region of ecDNA oncogene amplicons. We calculated numbers of SNVs in the whole genome as well as in only ecDNA amplicon present regions (Supplementary Fig. 9a) and found mutation frequency in the ecDNA amplicon regions occur at a similar level as in the whole genome from most patients, except for two GCA samples (Supplementary Fig. 9a). Statistical analysis showed that there was no significant difference in mutation frequency between ecDNA amplicon regions and the whole genome in our GCA cohort (Supplementary Fig. 9b, p = 0.18). We also compared the numbers of SNVs in regions of individual oncogene or TSG ecDNA regions (same oncogene or TSG ecDNA observed in 2 or more patients) between present and absent oncogene ecDNA patients (Supplementary Fig. 9c) and found that there were significantly more SNVs in the ecDNA present group only with respect to the BIRC3 gene (Supplementary Fig. 9c, p = 0.031) but not at other oncogenes (Supplementary Fig. 9c). Thus, we concluded that there may be no relationship between oncogene mutations and the presence of oncogene ecDNA amplicons in GCA patients.
The presence of oncogene ecDNA amplicons has the diverse correlation with the prognosis of GCA
It was reported that the presence of ecDNA is associated with oncogene amplification and poor outcomes across multiple cancers7. Thus, we investigated the relationship between oncogene amplification, the presence of ecDNA and patient prognosis in our GCA cohort. We first explored the relationship between oncogene amplification and GCA patient prognosis by focusing on the top 11 high frequency of oncogenes and TSGs ecDNA amplicons. We found that most of the top 11 high frequency oncogene amplifications across the cohort with a copy number (CN) greater than 5 came from ecDNA amplicons (Supplementary Fig. 10). We compared the gene copy numbers and patient survival time by splitting the gene amplification into different groups (High, Low, Normal) (Supplementary Fig. 10). As expected, the survival time in some GCA patients after surgery was shorter in those with a high copy number of certain oncogenes, including EGFR, MYC, and BIRC3 (Supplementary Fig. 10). Surprisingly, we found that patients with a low CN amplification of CCNE1 and ERBB2 survived for a shorter period compared to those with a normal gene CN (Supplementary Fig. 10), and patients survived even longer with a high CN of CCNE1 and ERBB2 amplification (Supplementary Fig. 10). To further investigate our observation, we performed a correlation study between different ranges of CN amplification and survival time from the CCNE1, ERBB2, and EGFR genes (Fig. 4a). The results indicated that the short survival time was due to the high range of oncogene amplification in EGFR. However, for ERBB2 and part of the sample of CCNE1, the tendency was completely opposite. Specifically, we found that four samples with a high CN of CCNE1, caused by ecDNA amplicons, exhibited an average survival time of 5.08 years, and all samples with a high CN of ERBB2 had an average survival time of 6.59 years (Fig. 4a).
Furthermore, we focused on investigating the relationship between prognosis and CN of three oncogenes: CCNE1, ERBB2, and EGFR. EGFR followed the tendency that those with high-range oncogene amplification had a decreased survival time than those with low-range amplification (p = 0.0013) (Fig. 4b). The relationship between EGFR copy number and patient survival time reflects oncogene function in tumorigenesis from GCAs. For both ERBB2 and CCNE1, we found that patients with low range amplification had the worst prognosis compared to those with normal and high range amplification(Fig. 4b). To our surprise, patients with high range amplification from CCNE1 and ERBB2 had the best prognosis compared to those with low and middle range amplification (Fig. 4b). To further confirm the relationship between oncogene amplification and patient survival, we performed the WES sequencing on another independent GCA cohort with 39 GCA patients together with our 36 GCA patient cohorts (Supplementary Fig. 11a, Supplementary Table 5). First, the copy numbers of ERBB2 from WGS in the 36 patients were very similar to the copy numbers detected in the WES data (Supplementary Fig. 11b), which indicates that the WES data could be used to validate our WGS observation of ERBB2 gene amplification. Next, we focused on the WES data for 75 GCA patients, and we observed a similar tendency, namely, that the high-range ERBB2 amplification was correlated with increased survival time (Supplementary Fig. 11c, Supplementary Table 6). Taken together, we concluded that our observation is independent of the specific GCA cohort. This negative correlation between oncogene amplification and patient prognosis has previously been reported in many independent studies, including large group studies in the TCGA7. We found a similar tendency for some oncogenes in GCA, such as EGFR. The negative correlation is true for the low range amplification from ERBB2 and CCNE1 (Fig. 4b); however, the correlation becomes positive when these two genes undergo high range amplification (Fig. 4b).
Next, we investigated the relationship between the presence of oncogene ecDNA amplicons and patient prognosis by dividing patients into ecDNA present and absent groups (Fig. 4c), and we found diverse correlations of present oncogene ecDNA amplicons and patient survival. In brief, we found no significant difference in prognosis for the absence and presence of CCNE1 ecDNA amplicons (Fig. 4c, p = 0.55); the presence of EGFR ecDNA amplicons had a negative correlation with patient prognosis (Fig. 4c, p = 0.036); and the presence of ERBB2 ecDNA amplicons had a positive correlation with patient prognosis (Fig. 4c, p = 0.0068). To understand whether our observation was due to clinicopathological factors from GCA patients, we first investigated the relationship between clinicopathological phenotypes and prognosis in GCA (Methods, Supplementary Fig. 12, Supplementary Table 4). We found that UICC tumour stage was the only clinicopathological factor correlated with GCA survival (Supplementary Fig. 12i). Next, we performed survival analysis using clinicopathological variables of patients together with the presence of ecDNA amplicons (ERBB2, EGFR, CCNE1) by dividing patients into those with and without ecDNA amplicons (Supplementary Fig. 12). We found that the presence of ERBB2 ecDNA amplicons may be relevant to the UICC tumour stage but not to other clinicopathological variables (Supplementary Fig. 12). However, the presence of EGFR and CCNE1 ecDNA amplicons was not relevant to any clinicopathological variables (Supplementary Fig. 12). Since both UICC tumour stage (Supplementary Fig. 12i) and the presence of ERBB2 ecDNA (Fig. 4c) are contributing factors to patient survival, we assumed that there might be some connection between the presence of the ERBB2 ecDNA amplicon and GCA stage. However, our sample size was too small (36 cases) to obtain further conclusions. It will be very interesting to perform further studies with larger sample sizes of patients to obtain additional conclusions in the future.
The positive correlation between the presence of ERBB2 ecDNA in GCA and patient prognosis is paradoxical to large-scale TCGA studies in many cancer types7, where the presence of ecDNA amplicons was shown to be associated with poor outcomes. Since it was reported that there is a paradoxical relationship between chromosomal instability and survival outcomes in cancer42, we examined whether the positive correlation between the presence of ERBB2 ecDNA amplicons and patient prognosis is due to chromosomal instability (CIN) in our GCA cohort. The survival analysis from the four groups of CIN (Methods, Supplementary Fig. 8c) shows that GCA patients with stable chromosomes survived longer than patients with unstable chromosomes in our cohort (Supplementary Fig. 13a). However, we did not find that ERBB2 ecDNA amplicons present in samples were only enriched in specific CIN groups (Supplementary Fig. 13b), and we did not observe a significant difference in CIN values between ecDNA present samples and ecDNA absent samples (Supplementary Fig. 13c, p = 0.33). Thus, we concluded that the paradoxical relationship between the presence of ERBB2 ecDNA amplicons in GCA patients and survival outcome is independent of CIN. A recent study showed chromatin structure of ecDNA is highly accessible52, we assumed that the ERBB2 gene is highly expressed in ecDNA present GCA patients. It was also reported the amplification of ERBB2 gene was followed by ERBB2 gene overexpression in the same GCA tissue18,53-55. At the same time, we observed a positive correlation between ERBB2 gene expression and ERBB2 protein expression in GCA patients (n = 44) (Supplementary Fig. 14a, R = 0.79, Supplementary Table 7). Thus, we hypothesized that protein levels of ERBB2 were also high in ERBB2 ecDNA present patients, and that a high level of ERBB2 protein would be positively associated with GCA prognosis. To test our hypothesis, we performed immunohistochemistry of the ERBB2 protein from 1668 GCA patients (with 0-to 7-year survival time after surgery) (see Methods, Supplementary Fig. 14b, Supplementary Table 8). Although we did not observe a significant difference in patient prognosis among all patients (n = 1668, Supplementary Fig. 14c, p = 0.16), there was a significant difference in patient prognosis in patients surviving between 0-2 years (including 2 years) after surgery (n = 750, Fig. 4d, p = 0.016) and in patients surviving between 2-7 years (n = 918, Fig. 4d, p = 0.025). We concluded that there is a positive correlation between ERBB2 protein presence and patient prognosis in 2-7 year survival after surgery, and there is a negative correlation between ERBB2 protein presence and patient prognosis in the 0-2 year survival after surgery. It was reported ERBB2 protein expression and gene amplification correlate with better survival in esophageal adenocarcinoma56, and the positive correlation between the presence of ERBB2 protein and increased patient survival (2-7 years of survival) in our GCA cohort likely also reflects the similarity between esophageal adenocarcinoma features and GCA. Since we assumed that the protein level of ERBB2 is high in ERBB2 ecDNA-positive patients, our observation indicates that the ERBB2 ecDNA amplicon may represent a good prognostic marker in GCA patients.
Discussions
In summary, for the first time, we identified ecDNA amplicons in GCA patients using WGS data, and validated these ecDNA amplicons using Circle-seq. We found that these ecDNA amplicons are present in most GCA patients, and have exhibit heterogeneity in different GCA patients. Additionally, for the first time, we found that several oncogenes are in the format of ecDNA amplicons in GCA patients and that different oncogenes could coamplify in the same ecDNA amplicon. Interestingly, we found oncogene ecDNA amplicons were associated with a high frequency of chromothripsis in our GCA cohort, and such a high frequency of chromothripsis in our cohort is likely due to high degree of DNA damage induced by nitrosamine exposure from a local diet45-50. We propose that local dietary habits from the geographic region may have contributed to ecDNA occurrence in our GCA cohort. It was reported that ecDNA is a major mechanism of drug resistance in several tumour types3, thus, it will be valuable to follow clinical annotation on previous exposure therapy together with ecDNA detection in large-scale samples of GCA patients to design therapy strategies for GCA patients in the future.
Strikingly, we found that the correlation between the present oncogene ecDNA amplicons and patient prognosis was different depending on gene in GCA patients, where ERBB2 ecDNA amplicons correlated with good prognosis, EGFR ecDNA amplicons correlated with poor prognosis and CCNE1 ecDNA amplicons did not correlate with prognosis. The relationship between presence of ecDNA and prognosis in GCA reported in this study is different from a previous report indicating that oncogene ecDNA amplicons correlate with poor prognosis in other cancers from TCGA7, and our observation likely reflects the heterogeneous nature of cancers. These diverse associations of oncogene ecDNA amplification and prognosis may aid in designing better personal therapy strategies for GCA patients in the future. Large-scale ERBB2 immunohistochemistry results from 1668 GCA patients demonstrated that there was a positive correlation between ERBB2 protein presence and patient prognosis in 2-7-year survival after surgery; however, there was a negative correlation between ERBB2 protein presence and patient prognosis in 0-2-year survival after surgery. This paradoxical relationship between ERBB2 protein presence and prognosis is similar to a previous report on the relationship between ERBB2 protein expression and improved survival in esophageal adenocarcinoma56, which likely reflects the similarity in features between esophageal adenocarcinoma and GCA. Since we assumed that the protein levels of ERBB2 are high in ERBB2 ecDNA-positive patients, our observation indicates that the ERBB2 ecDNA amplicon may represent a good prognostic marker in GCA patients.
Author contributions
L.D.W and X.C. conceived and designed the study; X.K.Z., X.S., L.L.L., R.H.X., W.L.H., P.P.W., and F.Y.Z. contributed to the collection of the patient materials and clinical information; X.K.Z., X.S., M.M.Y., J.F.H., and K.Z. prepared the WGS and ecDNA sequencing of GCA; P.X. performed all the sequencing data mining; L. Z., Y. D., L.L.L., X.N.H., C.L.M., and J.J.J. were responsible for the protein expressions of ERBB2, γH2AX, and MSI staining in the GCA and analysis of the relationship with the GCA survival; M.Z., X.K.Z., P.X., L.D.W., and X.C. wrote the manuscript together with input from all authors; and L.D.W. and X.C. supervised all aspects of this work.
Competing Financial Interests statements
The authors declare no competing financial interests.
Availability of the data
All raw data is deposited in the China National Center for Bioinformation with access number of HRA00081
Supplementary Figures 1-14 are in the separated file.
Supplementary Table 1-8 is in the separated file.
Materials and Methods
GCA samples collection and follow-up visiting of patients
All clinical samples were collected following the ethic permit from the local hospitals located at high-incidence areas of GCA in the Taihang Mountains of north central China. All patients in our study were not received radiotherapy or chemotherapy before the surgery. 1668 GCA patients for ERBB2 immunohistochemistry (IHC) staining are from the Esophageal Cancer database (from years of 1973-2020) which established and maintained by Henan Key Laboratory for Esophageal Cancer Research of the First Affiliated Hospital, Zhengzhou University, China1-4. In our Esophageal cancer database, Clinical GCA tumors and matched normal tissues are both preserved with snap freezing in liquid nitrogen and archived in formalin-fixed paraffin-embedded (FFPE) tissue block for each GCA patient. In the studies of whole genome sequencing (WGS), whole exome sequencing (WES), RNA-Seq and protein expression measurement with mass spectrometry, snap freezing samples were used. In the study of IHC staining, FFPE samples were applied. The diagnosis of GCA patients were always identified by two well-trained pathologists in the pathology department of the local hospital, where the hematoxylin and eosin (HE) staining was used to quantify the content of tumor cell in tissue section and only GCA samples with more than 80% tumor cells are used for our study. The matched normal tissue samples were selected from the adjacent epithelial tissue which is 5-10 cm away from the edge of tumor. Both of 36 pairs of GCA tumor and matched adjacent normal tissue for whole-genome sequencing (WGS) and 75 pairs of GCA tumor and matched adjacent normal tissue for whole-exon sequencing (WES) are scanned and confirmed with two well-trained pathologists in the same procedure. The complete clinicopathological information of all patients was recorded and included in our study. All patients are included in regular follow-up visiting plan with following frequency: once every three months during the first year, once each 6 months during the second year, and once per year after the third year. The definition of overall survival time for dead patients is a period from diagnosis to death, and the definition of overall survival time for alive patients is a period from diagnosis to last follow-up visit (Jan 2021).
WGS library preparation and sequencing
WGS sequencing libraries were prepare following the previous report with slight modifications5. In brief, genomic DNA was extracted from snap freezing GCA tumor and matched normal tissue with DNeasy Blood & Tissue Kit (69504, QIAGEN) following manufacturer instruction. DNA concentration was measured by Qubit DNA Assay Kit in Qubit 2.0 Flurometer (Invitrogen). A total amount of 0.4µg DNA per sample was fragmented to an average size of ∼350bp with hydrodynamic shearing system (Covaris, Massachusetts, USA) and subjected to DNA library preparation with Illumina TruSeq DNA sample preparation kit (15026486, Illumina). Sequencing was carried out on Illumina NovaSeq 6000 with 150bp paired end mode according to the manufacturer instruction.
WES library preparation and sequencing
WES sequencing libraries were prepare following the previous report with slight modifications6. In brief, genomic DNA was extracted from snap freezing GCA tumor or matched normal tissue using DNeasy Blood & Tissue Kit (69504, QIAGEN) according to the manufacturer’s instruction. DNA degradation and contamination were monitored on 1% agarose gels. DNA concentration was measured by Qubit DNA Assay Kit in Qubit 2.0 Flurometer (Invitrogen). A total amount of 0.6 µg genomic DNA per sample was fragmented to an average size of 180∼280bp and subjected to DNA library preparation using Illumina TruSeq DNA sample preparation kit. The Agilent SureSelect Human All ExonV5 Kit (5190-6209, Agilent Technologies) was used for exome capture according to the manufacturer’s instruction. In brief, DNA libraries were hybridized with liquid phase with biotin labeled probes from the Agilent SureSelect Human All ExonV5 Kit, then magnetic streptavidin beads were used to capture the exons of genes. Captured DNA fragments were enriched in a PCR reaction with index barcodes for sequencing. Final libraries were purified using AMPure XP beads (A63880, Beckman Coulter) and quantified using the Agilent high sensitivity DNA kit (5067-4626, Agilent Technologies). WES libraries were sequenced on Illumina Novaseq 6000 (Illumina) with 150bp paired end mode according to the manufacturer instruction.
Circle-Seq library preparation and sequencing
EcDNA sequencing Service was provided by CloudSeq Biotech Inc. (Shanghai, China) by following the published procedures with slight modification7. Circle-Seq was performed on 10 pairs of snap freezing GCA tumors and matched normal tissues. In brief, 6 mg of snap freezing GCA tumors or matched normal tissues tissue were suspended in L1 solution (A&A Biotechnology, 010-50) and supplemented with 15 μl proteinase K (ThermoFisher, E00491) before incubation overnight at 50 °C with agitation. After Lysis, samples were alkaline treated, followed by precipitation of proteins and separation of chromosomal DNA from circular DNA through an ion exchange membrane column (Plasmid Mini AX; A&A Biotechnology, 010-50). Column-purified DNA was treated with FastDigest MssI (ER1341, Thermo Scientifific,) to remove mitochondrial circular DNA and incubated at 37 °C for 16 h. Remaining linear DNA was removed by exonuclease (E3101K, Plasmid-Safe ATP-dependent DNase, Epicentre,) at 37 °C in a heating block and enzyme reaction was carried out continuously for 1 week, adding additional ATP and DNase every 24 h (30 units per day) according to the manufacturer’s protocol (E3101K, Plasmid-Safe ATP-dependent DNase, Epicentre,). ecDNA-enriched samples were used as template for phi29 polymerase amplification reactions (150043, REPLI-g Midi Kit) amplifying ecDNA at 30 °C for 2 days (46–48 h). Phi29-amplifified DNA was sheared by sonication (Bioruptor), and the fragmented DNA was subjected to library preparation with NEBNext® Ultra II DNA Library Prep Kit for Illumina (E7645S, New England Biolabs). Sequencing was carried out on Illumina NovaSeq 6000 with 150bp paired end mode.
ERBB2 RNA expression measurement and ERBB2 protein expression measurement in GCA patients
ERBB2 RNA expression measurement and ERBB2 protein expression measurement in 44 GCA patients from our Esophageal Cancer database (from years of 1973-2020), where ERBB2 RNA expression (Normalized value with RPKM (Reads Per Kilobase Million)) was extracted from RNA-seq data, and ERBB2 protein expression was extracted from mass spectrometry. For same GCA patient, both library for RNA-seq and library for mass spectrometry (MS) are prepared. The procedures of libraries preparation are briefly described as below. For RNA-seq library preparation: First, 100mg of each snap freezing GCA tumor tissue was used for total RNA isolation with TRIzol® Reagent (15596026, Thermo Fisher Scientific). RNA purity was checked using the NanoPhotometer® spectrophotometer (IMPLEN). RNA concentration was measured using Qubit® RNA Assay Kit in Qubit® 2.0 Flurometer (Life Technologies). RNA integrity was assessed using the Bioanalyzer 2100 system (Agilent Technologies). Then, two RNA-seq libraries were prepared for each GCA patients with technical replicates. 50ng total RNA was used as input for each RNA library preparation. The RNA-Seq libraries were prepared with NEBNext® UltraTM RNA Library Prep Kit for Illumina (E7530L, NEB) by following manufacturer’s instruction. RNA-seq libraries were purified with AMPure XP beads (A63880, Beckman Coulter) to select 150∼200 bp cDNA fragments. Sequencing library was quantified on the Bioanalyzer 2100 system (Agilent Technologies). The libraries were sequenced on an Illumina Novaseq 6000 platform with 150 bp paired-end reads. The RNA-seq sequencing libraries were aligned to the genome using STAR8 with default parameter to reference genome (hg19). After the alignment, the ERBB2 RNA expression are extracted, and normalized with RPKM. The final expression data for individual patient used to compare with protein expression is the average value of two technical replicates. For mass spectrometry library preparation: First, 10 mg of snap freezing GCA tumor tissues were grinded with liquid nitrogen into powder and then transferred to a 5-mL centrifuge tube. After that, four volumes of lysis buffer (1% Triton X-100, 1% protease inhibitor cocktail, 1% phosphatase inhibitor) was added to the cell powder, followed by sonication three times on ice using a high intensity ultrasonic processor (Scientz). The remaining debris was removed by centrifugation at 12,000 g at 4 °C for 10 min. After centrifugation, the supernatant was collected and the protein concentration was measured with Piece™ BCA protein kit (23227, Thermo Fisher Scientific) according to the manufacturer’s instruction. Then, the 100 μg of protein from each sample was taken for protein digestion, and the volume was adjusted to the same with lysate. The sample was slowly added to the final concentration of 20% v/v trichloroacetic acid (TCA) to precipitate protein, then vortexed to mix and incubated for 2hs at 4 °C. The precipitated protein was collected by centrifugation at 4500 g for 5 min at 4 °C. The precipitated protein was washed with pre-cooled acetone for 3 times to remove traces of TCA and finally acetone was removed by drying in a fume cupboard. The protein sample was then added 100 mM Triethylammonium bicarbonate (TEAB) and ultrasonically dispersed. Trypsin was added at 1:50 trypsin-to-protein mass ratio for the first digestion overnight. The sample was reduced with 5 mM dithiothreitol for 30 min at 56 °C and alkylated with 11 mM iodoacetamide for 15 min at room temperature in darkness. Next, 50 μg of tryptic peptides were firstly dissolved in 0.5 M TEAB. Each channel of peptide was labeled with their respective TMT reagent, and incubated for 2 hours at room temperature. Five microliters of each sample were pooled, desalted and analyzed by MS to check labeling efficiency. After labeling efficiency check, samples were quenched by adding 5% hydroxylamine. The pooled samples were then desalted with Strata X C18 SPE column (Phenomenex) and dried by vacuum centrifugation. Then, the dried tryptic peptides were dissolved in solvent A (0.1% formic acid, 2% acetonitrile/ in water), directly loaded onto a home-made reversed-phase analytical column (25-cm length, 100 µm i.d.). Peptides were separated with a gradient from 8% to 10% solvent B (0.1% formic acid in 90% acetonitrile) over 2 min, 10% to 23% solvent B over 38 min, 23% to 33% in 14 min and climbing to 80% in 3 min then holding at 80% for the last 3 min, all at a constant flowrate of 450 nL/min on an EASY-nLC 1200 UPLC system (Thermo Fisher Scientific). The separated peptides were analyzed in Q ExactiveTM HF-X (Thermo Fisher Scientific) with a nano-electrospray ion source. The electrospray voltage applied was 2.2 kV. The full MS scan resolution was set to 120,000 for a scan range of 400–1500 m/z. Up to 20 most abundant precursors were then selected for further MS/MS analyses with 30 s dynamic exclusion. The HCD fragmentation was performed at a normalized collision energy (NCE) of 28%. The fragments were detected in the Orbitrap at a resolution of 45,000. Fixed first mass was set as 100 m/z. Automatic gain control (AGC) target was set at 5E4, with an intensity threshold of 5.8E4 and a maximum injection time of 86 ms. The resulting MS/MS data were processed using MaxQuant search engine (v.1.6.10.43). Tandem mass spectra were searched against the human SwissProt database (20366 entries) concatenated with reverse decoy database. Trypsin/P was specified as cleavage enzyme allowing up to 2 missing cleavages. The mass tolerance for precursor ions was set as 10 ppm in First search and 5 ppm in Main search, and the mass tolerance for fragment ions was set as 0.02 Da. Carbamidomethyl on Cys was specified as fixed modification. Acetylation on protein N-terminal, oxidation on Met and deamidation (NQ) were specified as variable modifications. TMT-11plex quantification was performed. FDR was adjusted to < 1% and minimum score for peptides was set > 40. The ERBB2 protein expression level for each patient was extracted from protein lists of MS result.
Data analysis of WGS data, WES data, copy number alteration (CNA) and ecDNA amplicons
All detailed scripts were deposited in following link: https://github.com/chenlab2019/ecDNA-on-GCA. The WGS data of 36 samples were aligned to the reference genome (hg19) using BWA-MEM v.0.7.179 with the default parameter and were sorted by SAMtools v.1.910. PCR duplicates were removed from aligned BAM files by Sambamba v.0.7.011. By taking matched normal samples as background, tumor-specific CNAs were called by copyCat package (https://github.com/chrisamiller/copyCat) which is loosely based on readDepth12. During the process of CNA calling, bam-window tools (https://github.com/genome-vendor/bam-window) was used to count reads in 10Kbp window size. AmpliconArchitect (AA) was applied to filter CNAs with copy number greater than 4x and size greater than 100Kbp. Theadjacent CNAs were merged into a single interval. These intervals were fed into AmpliconArchitect software13 as seeds to detect ecDNA amplicons14. The oncogene annotation of ecDNA amplicons was based on the genome intervals of amplicons following AA pipeline13. The genomic annotation of ecDNA amplicons was performed with intersection between regions of ecDNA amplicons and genomic annotation of reference genome (hg19) with bedtools15. In brief, regions of the ecDNA amplicons were extracted from the output of AA software. The intersection between genomic annotation of reference genome (hg19) and ecDNA regions was performed with bedtools first15, then the length of overlapping regions between genomic elements from reference genome and ecDNA regions was extracted. Genomic elements were annotated to ecDNA amplicons if there was one bp or longer overlapping. The occupancy of coding regions and exons regions in ecDNA amplicons were calculated with following formulas: EcDNA amplicons were further classified into different categories (linear, complex, circular, breakage-fusion-bridge (BFB) and invalid) with AA software (https://github.com/jluebeck/AmpliconClassifier) by following the previous report16. Circle-finder17-19 was used to confirm the circular structure of ecDNA amplicons by following the instruction, where circular junction points were detected with sequencing reads orientation. The length of overlapping region between circular ecDNA predicted from AA and circular ecDNA detected with Circle-finder was calculated with bedtools. When the length of overlapping region is longer than 1bp, circular ecDNA amplicons from AA were labelled as overlapping with results of Circle-finder.
For WES data analysis from 75 pairs of GCA tumor samples and matched adjacent normal tissues: sequencing reads containing adaptors and low-quality reads were removed and aligned to human reference genome (hg19) using BWA-MEM v.0.7.179 with the default parameter and sorted by SAMtools v.1.910. All non-primary alignments were filtered by SAMtools. PCR duplicates were marked using Picard. CNAs from tumor was called by using matched adjacent normal tissues by CNVkit20. The numbers of CNAs on ERBB2 gene from each GCA patient are extracted for further analysis.
Data mining of Circle-seq
All reads were aligned to human genome hg19 using BWA-MEM v.0.7.179 with default parameters. PCR duplicates were removed from the BAM file with Sambamba v.0.7.09. By taking normal samples as background, peak calling on tumor samples was performed using variable-width windows of Homer v.4.11 with command findPeaks tumor -i normal -style histone -fdr 0.001 (http://homer.ucsd.edu/)23. The tumor-specific enriched peaks were considered as the fragments of circular DNA. Overlaps between enriched peaks from Circle-Seq and ecDNA amplicons from AA were calculated, and circular ecDNA amplicon from AA is labelled as validated when the overlapping regions is 1bp or longer than 1bp. For the visualization of the peak of Circle-Seq, BAM file was converted into bigwig file using deeptools bamCoverage with normalization of counts per million (CPM)24.
Detection of chromothripsis events
All detailed scripts were deposited in following link: https://github.com/chenlab2019/ecDNA-on-GCA. Chromothripsis events from 36 pairs of GCA tumor samples were detected with ShatterSeek software v.0.4 using copy number alterations (CNAs) and structural variants (SVs) following the previous report25. SVs were identified on tumor samples using the Delly26 and novoBreak27 software by taking matched adjacent normal tissues as control, and final list of SVs are merged lists from Delly and NovoBreak. CNAs from WGS were calculated with copyCat package28. All SVs and CNVs from tumor samples are used to identify chromothripsis events with ShatterSeek, where SVs and CNVs from matched adjacent normal tissues are treated as background. Events were considered as high confidence (termed HC) when there were at least 7 oscillating CN segments, and considered as low confidence (termed LC) when there were 4-6 oscillating CN segments11. The chromothripsis events were labeled as within regions of ecDNA amplicons when there is 1bp or longer intersection between segments from chromothripsis and regions of ecDNA amplicons.
Single-nucleotide variant (SNV) analysis
All detailed scripts were deposited in following link: https://github.com/chenlab2019/ecDNA-on-GCA. All SNVs from WGS were called by GATK v.4.1.7 software29 with Mutect2 parameter and filtered by “GATK FilterMutectCalls”. The mutation profiles were visualized by R/Bioconductor package “maftools”30. The number of SNVs within region of ecDNA amplicons and whole genome region were counted respectively for each sample. The average number of SNVs per million nucleotides from regions of ecDNA amplicons and whole genome were calculated with following equations: Numbers of SNVs within individual oncogene ecDNA amplicon from groups of absent and present this gene ecDNA were also compared: first high frequency of oncogene ecDNA amplicons (appeared at least in 2 patients) in 36 patients are selected, then the number of SNVs within each selected oncogene from individual patient was calculated and numbers of SNVs between groups of present and absent this oncogene ecDNA were compared.
Oncogene ecDNA amplicon analysis
All detailed scripts were deposited in following link: https://github.com/chenlab2019/ecDNA-on-GCA. The list of oncogenes and tumor suppressor genes ecDNA amplicons was extracted from the report of AmpliconArchitect following AmpliconArchitect workflow13. The copy number of each oncogene from 36 GCA samples was extracted from the report of copyCat. Oncogenes and/or tumor suppressor genes are labeled as oncogene co-amplification if two or more than two oncogenes and/or tumor suppressor genes are located in the same ecDNA amplicon.
Calculation of Chromosomal instability (CIN)
All detailed scripts were deposited in following link: https://github.com/chenlab2019/ecDNA-on-GCA. The chromosomal instability (CIN) was calculated following the previous report31, and groups of chromosomal instability (CIN) is defined with by number of genome integrity index (GII). GII was defined as the fraction of the genome that was altered based on the common regions of alteration. CIN of GCA patients was divided into four groups based on GII (0-0.2, 0.2-0.4, 0.4-0.6, 0.6-0.8), and 36 GCA patients were assigned into different groups of CIN.
Prognoses and statistical analysis
All computational codes aand scripts are deposited in following link:https://github.com/chenlab2019/ecDNA-on-GCA. R package “survival” with Kaplan-Meier method was used32 to calculate and compare patient prognosis between different groups of GCA patients. The statistic methods used in prognosis analysis with clinicopathological factor are as follows: Fisher’s exact test for sex, family history cigarette smoking, alcohol consuming and tumor stage, and Wilcoxon signed-rank test for age. All analyses were performed on R v.3.6.2, Python v.2.7.16 and Python 3.7.4. The visualization of survival curve was conducted by ggplot233, karyoploteR34, pheatmap R packages and Circos37, IGV software38.
Immunohistochemistry (IHC) staining of ERBB2 protein
IHC was performed by following the previous report39 with slightly modifications. In brief, 5-µm thick formalin fixed paraffin-embedded GCA tissue sections were first deparaffined with xylnene 15mins for 3 times, then were dehydrated through 100% alcohol, 85% alcohol and 75% alcohol for 5mins each, followed by distilled water rinsing for 5 mins. The epitope retrieval is performed in the microware by putting the tissue into citrate buffer (pH 6.0). After the epitope retrieval, the tissue section is rinsed in Phosphate-Buffered Saline buffer (PBS, PH7.4). After blocked with 3% bovine serum albumin (BSA) 30mins at room temperature, the tissues were incubated with ERBB2 antibody (1:100 dilution, SAB5700151, Sigma-Aldrich) overnight at 4°C. In the next day, the washing is performed with PBS buffer for 3 times, 15mins each. The secondary antibody (Horseradish Peroxidase, HRP marked, PV-9000, ZSGB-BIO) was incubated for 50 mins at room temperature. After the secondary antibody incubation, the washing is performed with PBS buffer 3 times on shaker, 15 mins each. The tissue is stained with the Harris Hematoxylin for 3 mins. At last, the tissue section was mounted and imaged. Sections with no signal in any cell were defined as negative groups; sections with 5 or more cells with ERBB2 positive signal were defined as positive groups.
Immunohistochemistry (IHC) staining of γH2AX
The staining protocol is same as ERBB2 staining. The primary antibody of γH2AX (SAB5700329, Sigma-Aldrich) was with 1:200 dilution. The staining of γH2AX was categories into positive and negative groups with following parameters: Section with no γH2AX signal in any cell was defined as γH2AX negative groups; section with 5 or more cells with γH2AX positive signal was defined as γH2AX positive groups.
MSI detection with Immunohistochemistry (IHC) staining
IHC staining of four mismatch repair (NMR) proteins: MLH1 (1: 100, PA5-32497, Thermo Fisher Scientific), MSH2 (1: 500, MA5-15740, Thermo Fisher Scientific), MSH6 (1: 100, MA5-32040, Thermo Fisher Scientific) and PMS2 (1: 150, MA5-26269, Thermo Fisher Scientific), were performed on 5-µm thick FFPE tumor sections from 36 GCA patient with same protocol as stated as above in ERBB2 IHC staining. The patient was labeled as microsatellite instability (MSI-high) if one of NMR proteins was negative stained, otherwise the patient is labeled as MSI-low.
Acknowledgements
This work is supported by the National Key R&D Program of China (2016YFC0901403 to L.D.W.), the National natural science foundation of China (81872032, U1804262 to L.D.W.), the Swedish Research Council (VR-2016-06794, VR-2017-02074 to X.C.), Beijer Foundation (to X.C.), Jeassons Foundation (to X.C.), Petrus och Augusta Hedlunds Stiftelse (to X.C.), Göran Gustafsson’s prize for younger researchers (to X.C.), Vleugel Foundation (to X.C.), and Uppsala University (to X.C.).
Footnotes
↵4 Lead author