Abstract
Sources of elevated cell-free DNA (cfDNA) in early-stage cancer patients are poorly understood. Leveraging a rich dataset of cfDNA in healthy individuals and cancer patients, we find that a large majority of cfDNA in cancer patients originates from non-tumor sources, and that the presence of early-stage cancer results in a multiplicative increase in the concentration of cfDNA originating from healthy tissue. The magnitude of the multiplicative increase is cancer-type specific and ranges from a ∼1.3-fold increase in lung cancer, to a ∼12-fold increase in patients with liver cancer. As cfDNA is cleared in the liver, the large increase in patients with liver cancer may imply that the systemic increase in cfDNA levels in the presence of cancer is due to slower clearance rate rather than higher cell turnover.
Main
Identifying biomarkers that can reliably detect cancer in early stages is one of the major challenges in oncology1. One promising biomarker is cell-free DNA (cfDNA), composed of the DNA fragments not bound to cells, that can be detected in the circulation. Increased levels of cfDNA are often reported in cancer patients2–5, in particular in later stages6. However, the mechanisms by which the presence of cancer affects cfDNA are not fully understood7–9. Recent work demonstrates that the majority of cfDNA in cancer patients with highly elevated cfDNA originates from leukocytes, and that the presence of cancer likely results in a systemic increase in cfDNA concentration in these patients10. Whether a similar systemic increase due to cancer is present in all cancer patients is currently not known.
In this setting, several questions must be answered in order to better understand the mechanics of DNA circulating in plasma, which could lead to improving the accuracy of liquid biopsy screening tests. In what way, and how much does cancer presence affect the total cfDNA concentration? Is this effect cancer type-specific, and does it differ between early cancer stages? In the present work, we address these questions by employing a large dataset from Cohen et al.11 for healthy individuals and patients with stage I-III ovarian, liver, stomach, pancreatic, esophageal, colorectal, lung, and breast cancer. This study11 reports the cfDNA concentrations in plasma, as well as the concentration and mutant allele frequency (MAF) of DNA fragments in plasma harboring mutations in cancer driver genes (Methods); we refer to this portion of cfDNA as circulating mutant DNA (cmDNA).
We observe that the 631 healthy individuals from Cohen et al.11 (Methods) can be divided into two groups: 572 (90.6%) that have cmDNA detected and have median cfDNA concentration 2.83ng/mL, and 59 (9.4%) that have no mutations detected in plasma and exhibit significantly (54 times) lower cfDNA concentrations (median 0.05ng/mL) (Fig. S1, Methods). Determining the reason why a portion of healthy population exhibits such low cfDNA concentrations requires further experimental investigation and is out of the scope of the present study. Since we aim to quantify and compare the effect of cancer on both cfDNA and cmDNA concentrations, we focus on the 90.6% of healthy individuals with cmDNA detected in plasma.
The mutant allele frequency for the most prevalent mutation detected in plasma of cancer patients of all types is low - median MAF is 0.06%, and 88.2% of mutations detected in plasma have MAF below 1%. Combined with the fact that 82% of cancer patients’ tumors harbor mutations (at high frequency) that can be detected by the liquid biopsy panel from Cohen et al.11, this indicates that early-stage tumors in all cancer types analyzed do not contribute significantly to cfDNA concentration; most of cfDNA originates from non-tumor (healthy) tissue.
It has been reported that most of cfDNA circulating in plasma comes from hematopoiesis in healthy individuals12–15, and some cancer patients13. Recent work by Mattox et al.10 found that, in cases of healthy individuals and cancer patients with highly elevated cfDNA levels, the vast majority (76%) of cfDNA arises from leukocytes. They report that, in cancer patients with highly elevated cfDNA levels, the presence of cancer leads to a systemic increase in cfDNA concentration arising from non-tumor sources10. Inspired by this finding, we asked if a similar increase in cfDNA may be present in all cancer patients. We found that the presence of stage I-III lung, breast, colorectal, pancreatic, ovarian, esophageal, stomach and liver cancer leads to a multiplicative increase in cfDNA concentration compared to healthy individuals (Fig. 1). The multiplicative increase we uncovered is not simply an increase in cfDNA concentration - we find that the entire dataset of cfDNA concentrations is multiplied by a positive factor in patients with cancer compared to healthy individuals (Kolmogorov-Smirnov test at significance level p = 0.05, Methods). This effect can be seen as a horizontal “shift” or translation when the cumulative distribution function is plotted on a logarithmic x-scale (Fig. 1).
The multiplicative increase in cfDNA concentration is specific to each cancer type: 1.29 (value range not rejected by KS test at 5% significance level: 1.15-1.48) for lung cancer, 1.15 (0.9-1.43) for stage I breast cancer, 1.79 (1.45-1.99) for stage II/III breast cancer, 2.54 (2.22-2.63) for colorectal, 2.69 (2.16-3.93) for pancreatic, 4.3 (3.69-5.59) for ovarian, 5.24 (3.85-7.2) for esophageal, 5.66 (5.49-5.85) for stomach, and 12.63 (10.94-16.81) for liver cancer (Fig. 1, Fig. S2, Table S1, Methods). We note that the multiplicative increase in cfDNA concentration in these cancer types originates from healthy tissue, as the large majority of cfDNA in stage I-III cancer patients is not shed from the tumor. There are no statistically significant differences in cfDNA concentration between stages I to III, for all cancer types except for breast cancer (Methods).
The uniform increase in cfDNA concentration observed in all evaluated cancer types could stem from two sources: either cfDNA shedding from non-tumor sources is systemically increased by a multiplicative factor in patients with cancer, or the presence of cancer results in a similar-fold lower clearance of cfDNA from circulation. There is evidence that an excess of dying cells can lead to slower cfDNA clearance, e.g. during vigorous exercise; the presence of cancer may also lead to an increase in dying cells, potentially slowing down the clearance of cfDNA. Cell-free DNA is typically cleared in spleen, kidney, and the liver7; it is of note that cfDNA concentration is highest in patients with liver cancer (∼12 times increase in cfDNA concentration compared to healthy individuals), indicating the slower cfDNA clearance rate as a potential mechanism of how tumor presence results in the multiplicative increase of cfDNA levels.
We next hypothesized that, if cfDNA shed from non-tumor sources exhibits a multiplicative increase in the presence of cancer, then a similar increase should also be observed in circulating mutant DNA (cmDNA) that does not originate from tumor. In the dataset we use11, both the most prevalent mutation in plasma and the mutations detected in tumor are reported for 645 of the 959 cancer patients (Methods). Comparing the cmDNA concentration in healthy individuals and in cancer patients in which cmDNA detected in plasma is not detected in tumor, we find uniform increases by a multiplicative factor for every cancer type: 1.92 (1.42-2.52) for lung, 2.42 (2.12-2.76) for breast, 2.36 (1.92-2.66) for colorectal, 4.01 (1.55-7.34) for pancreatic, 3.31 (1.76-5.38) for esophageal, 4.38 (2.74-5.77) for stomach, and 12.37 (5.54-24.62) for liver cancer (Fig. 2, Fig. S3, Table S2, Methods). We observe that the ranges of multiplicative factor values for cfDNA and cmDNA from healthy tissue are overlapping for each of the cancer types (Fig. S4, Methods), except for breast cancer where ranges are comparable yet non-overlapping.
Mutant DNA in plasma may originate from healthy or tumor tissue; however, it may also be detected artificially due to sequencing errors. In a more recent work16, Cohen et al. found that the artefactual mutations in plasma appeared with a MAF lower than 0.013%. Hence, we perform the cmDNA analysis again, disregarding the mutations with low MAFs (Table S3, Methods); the results remain similar to the results obtained for all mutations (Fig. S4, Methods).
The agreement between the multiplicative factors in total cfDNA concentration, and the concentration of cmDNA not originating from tumor, is a further validation that the presence of early-stage cancer results in a multiplicative shift towards higher concentrations of cfDNA (both wildtype and mutant) shed from healthy tissue; this increase is quantifiable and cancer type-specific. At this point, we cannot answer definitively whether this increase is the result of excessive cfDNA shedding or slower cfDNA clearance; it is also possible that these two causes are related: an initial, localized increase in cell turnover due to tumor presence could result in excess cfDNA that would slow down the cfDNA clearance rate from circulation.
Note that, the reported cfDNA concentrations may differ significantly, depending on the method used for the analysis of the samples. However, using additional data from Cohen et al.17, we observe once again the effect of multiplicative factor between cfDNA concentrations from samples of healthy individuals that were analyzed using different methodologies (Fig. S5, Methods).
Last, the cancer-specific multiplicative factor may be affected by the effects of therapy or anesthesia. More specifically, we observe a reduced multiplicative factor for a cohort of pancreatic cancer patients, many of whom had their blood sample taken after administration of anesthesia17. This was on par with the reported reduction in cfDNA concentration due to anesthesia18 (Fig. S6, Methods).
Methods
Patient data
Cohen et al.11 report the concentration (ng/mL) of total cell-free DNA (cfDNA) detected in plasma for 812 healthy individuals and for: 104 patients with lung cancer, 209 with breast cancer, 388 with colorectal cancer (CRC), 93 with pancreatic cancer, 54 with ovarian cancer, 45 with esophageal cancer, 68 with stomach cancer, and 44 with liver cancer. All cancer cases reported are in stages I to III. The dataset includes samples from 181 healthy individuals and 46 pancreatic cancer evaluated in a previous study17. In order to ensure that all samples were analyzed following the same process, we exclude these samples from the present work.
Cohen et al.11 also report the concentration (fragment/mL) of the circulating mutant DNA (cmDNA) of the most prevalent DNA mutation detected in plasma for the same population of healthy individuals and cancer patients. The panel for mutation detection from plasma includes 61 amplicons, each querying an average of 33 base pairs within one of the 16 cancer driver genes. Genes included in the panel and the number of amplicons for each gene are: NRAS (2), CTNNB1 (2), PIK3CA (4), FBXW7 (4), APC (2), EGFR (1), BRAF (1), CDKN2 (2), PTEN (4), FGFR2 (1), HRAS (1), KRAS (3), AKT1 (1), TP53 (31), PPP2R1A (1), GNAS (1).
Apart from the most prevalent mutation in plasma, mutations found in tumor are also reported for 336 of the patients with colorectal cancer; the respective number of patients is 70 for lung cancer, 116 for breast cancer, 15 for pancreatic cancer, 7 for ovarian cancer, 36 for esophageal cancer, 47 for stomach cancer, and 24 for liver cancer. Since it is important to compare mutations detected in the tumor and plasma of the same patient, we focus our analysis regarding cmDNA at the aforementioned subset of patients with known mutations in both plasma and tumor; we also exclude the 1 lung cancer patient, 2 CRC patients, 1 esophageal cancer patient, and 2 stomach cancer patients, for which no cmDNA was detected in plasma. Due to the small dataset (7 patients), we exclude ovarian cancer from the cmDNA analysis.
Kolmogorov-Smirnov test
Kolmogorov-Smirnov (KS) test is a nonparametric test for continuous one-dimensional probability distributions that we employ to check whether two datasets may come from the same distribution, or whether a dataset is drawn from a particular distribution19. More specifically, in the case of two datasets, we use the Kolmogorov-Smirnov test to check whether the multiplication of all elements of one dataset by the same deterministic factor may result in the two datasets originating from the same distribution.
Difference in cfDNA/cmDNA concentrations between different cancer stages
Cohen et al.11 report the cfDNA concentrations for patients in stages I/II/III of various cancer types. The number of patients in each stage is 46/27/31 for lung cancer, 32/114/63 for breast cancer, 77/191/120 for colorectal cancer, 2/39/6 for pancreatic cancer, 9/4/41 for ovarian cancer, 5/29/11 for esophageal cancer, 21/30/17 for stomach cancer and 5/19/20 for liver cancer. By performing KS tests between the samples of cfDNA concentrations in different stages of the same cancer type, the hypothesis that the samples originate from the same distribution is not rejected, at a 5% significance level, for all cases except for breast and liver cancer: KS test p-value between samples for stages II and III breast cancer is 0.59, p-values between stage I and II, and stage I and III breast cancer are 0.005 and 0.001 respectively. Thus, cfDNA concentrations in stage I breast cancer are significantly different than the concentrations in stages II/III. The same is observed for liver cancer stages, with p-values 0.01, <0.001, and 0.18 for the KS tests between stages I and II, I and III, and II and III respectively; however, for liver cancer, the sample for stage I patients is small (5 patients). For cmDNA samples from cancer patients and mutation not detected in tumor, we observe no significant differences in concentrations between sexes or stages.
Multiplicative factors for increased cfDNA/cmDNA in cancer patients
We determine the value of multiplicative factor a for which we have the minimum Kolmogorov distance between the empirical CDF for a times the sample of cfDNA concentration in healthy patients with mutations detected in plasma, and the empirical CDF for the sample of cfDNA concentration in patients of each cancer type. We report the value of the multiplicative factor a, and the p-value of KS test between the sample of cfDNA concentration in healthy patients with mutations detected in plasma multiplied by a, and the cfDNA concentration in patients of each cancer type (Table S1).
We also report the value range of the multiplicative factor a, for which the KS test does not reject, at 5% significance level, the hypothesis that cfDNA data for cancer patients and the cfDNA data for healthy patients multiplied by the respective factor come from the same distribution. We also observe the agreement between the datasets in probability-probability (P-P) plots (Fig. S2), which stay close to the diagonal for all types of cancer.
We also calculate the multiplicative factors a between the cmDNA concentrations reported in healthy individuals, and the concentrations of cmDNA not shed from tumor in patients of the various cancer types. The value of a resulting in the least Kolmogorov distance between the empirical CDFs, along with the 5% significance level ranges, are shown in Table S2; the respective P-P plots, which also stay close to the diagonal, are shown in Fig. S3. We exclude ovarian cancer from cmDNA concentration analysis, due to insufficient data: there are only 7 ovarian cancer patients with cmDNA reported. We also calculate the multiplicative factors for cmDNA concentrations, excluding the samples with MAF less than i) 0.01%, ii) 0.015%, and iii) 0.02%, since mutations detected in such low MAFs may be artefactual16. We report the numbers healthy individuals and cancer patients with cmDNA not from tumor for different cutoff values for MAF in Table S3. The values of multiplicative factors are similar for all cases considered (Table S2).
Dependence of cfDNA concentrations on the sample analysis methodology
The reported cfDNA concentrations depend on the method used for the analysis of the samples. In a 2017 study17, Cohen et al. analyzed samples from 182 healthy individuals using different methodology, and reported significantly higher cfDNA concentrations (median 4.25ng/mL, Kolmogorov-Smirnov test p-value < 0.001) than in the dataset we use (Cohen et al. 2018)11. The cfDNA concentrations in healthy individuals in the two studies differ by the multiplicative factor 1.57 (1.34-1.79); meaning the distributions for cfDNA concentrations are similar in shape, although shifted. (Fig. S5).
Effect of anesthesia on the multiplicative factor
The cancer-specific multiplicative factor may be affected by the effects of therapy or anesthesia. Anesthesia has been reported to reduce the concentration of cfDNA ∼2.5-fold18. Cohen et al. (2017)17 report cfDNA concentrations from 182 healthy individuals and 221 patients with stage I-III pancreatic cancer. As many pancreatic cancer patients in this study had their blood sample taken after administration of anesthesia, we expect that the multiplicative factor for pancreatic cancer is up to 2.5-fold lower compared to Cohen et al. (2018)11, which excluded patients where anesthesia was known to be administered prior to sample collection. Indeed, we find that the multiplicative factor for pancreatic cancer from Cohen et al. (2017) 17 is 1.45 (1.30-1.59) (Fig. S6). This factor range is ∼2 to ∼2.5 times lower that the multiplicative factor for pancreatic cancer in Cohen et al. (2018) 11 data, as expected.
Acknowledgements
The authors thank Bert Vogelstein for helpful comments on the manuscript. This work is supported by the National Science Foundation grant DMS-2045166. K.M. is also supported by the Pacific Institute for the Mathematical Sciences (PIMS).
References
Additional References
- 19.↵