Detection of early stage pancreatic cancer using 5-hydroxymethylcytosine signatures in circulating cell free DNA

Francois Collin; Yuhong Ning; Tierney Phillips; Erin McCarthy; Aaron Scott; Chris Ellison; Chin-Jen Ku; Gulfem D Guler; Kim Chau; Alan Ashworth; Stephen R Quake; Samuel Levy

doi:10.1101/422675

Abstract

Pancreatic cancers are typically diagnosed at late stage where disease prognosis is poor as exemplified by a 5-year survival rate of 8.2%. Earlier diagnosis would be beneficial by enabling surgical resection or earlier application of therapeutic regimens. We investigated the detection of pancreatic ductal adenocarcinoma (PDAC) in a non-invasive manner by interrogating changes in 5-hydroxymethylation cytosine status (5hmC) of circulating cell free DNA in the plasma of a PDAC cohort (n=51) in comparison with a non-cancer cohort (n=41). We found that 5hmC sites are enriched in a disease and stage specific manner in exons, 3’UTRs and transcription termination sites. Our data show that 5hmC density in H3K4me3 sites is reduced in progressive disease suggesting increased transcriptional activity. 5hmC density is differentially represented in thousands of genes, and a stringently filtered set of the most significant genes exhibited biology related to pancreas (GATA4, GATA6, PROX1, ONECUT1) and/or cancer development (YAP1, TEAD1, PROX1, ONECUT1, ONECUT2, IGF1 and IGF2). Regularized regression models were built using 5hmC densities in statistically filtered genes or a comprehensive set of highly variable gene counts and performed with an AUC = 0.94-0.96 on training data. We were able to test the ability to classify PDAC and non-cancer samples with the elastic net and lasso models on two external pancreatic cancer 5hmC data sets and found validation performance to be AUC = 0.74-0.97. The findings suggest that 5hmC changes enable classification of PDAC patients with high fidelity and are worthy of further investigation on larger cohorts of patient samples.

Introduction

Translational research using genomic and proteomic technologies has provided molecular insights into the pathogenesis and biology of pancreatic cancer but has yet to yield robust diagnostic biomarkers to impact early diagnosis of disease, as reflected by a low overall 5-year survival rate of 8.2%^1,2. Pancreatic cancer often presents late and has few symptoms, at which point only 10-20% of patients are eligible for surgical resection ². Pancreatic ductal adenocarcinoma (PDAC) and its variants account for more than 90% of all pancreatic malignancies ³ with the next most common sub-type being neuroendocrine tumors ². Tobacco smoking confers a two- to three-fold higher risk of pancreatic cancer and also demonstrates a dose-risk relationship, while contributing to approximately 15 to 30% of cases ², with smokers diagnosed 8 to 15 years younger than non-smokers^4,5. Family history is contributory in approximately 10% of cases, and germline mutations in genes such as BRCA2, BRCA1, CDKN2A, ATM, STK11, PRSS1, MLH1 and PALB2 are associated with pancreatic cancer with variable penetrance ².

The management of PDAC presents physicians with challenges along the entire clinical spectrum, including early detection in high risk individuals, early diagnosis of patients with symptoms or imaging findings, prognostication of outcomes and prediction of therapeutic responsiveness. Collectively these factors have engendered intensive efforts in translational research to identify and validate biomarkers with sufficient clinical performance metrics to improve decision algorithms and resultant clinical outcomes. Current guidelines in PDAC management are limited to two biomarker recommendations tested in an invasive fashion in cystic fluid. First, carbohydrate antigen 19-9 (CA 19-9) guides surgery decisions, use of adjuvant therapy, or the detection of post-operative tumor recurrence, however, the utility is limited because 10% of the population does not secrete the antigen⁶. Second, carcinoembryonic antigen (CEA) concentration determination from cyst fluid is used to distinguish higher risk mucinous from non-mucinous cysts^7,8, thereby mitigating risk. Among the inherited risk factors are genomic mutations such as BRCA2, which confers a 3.5-fold risk in carriers, with the probability of a germline mutation between 6 to 12% in PDAC patients with a first-degree relative diagnosed with PDAC⁹.

Molecular analyses of pancreatic cancer genomes have revealed activating mutations in KRAS and inactivation of CDKN2A, TP53 and SMAD4, either through point mutation or copy number changes at >50% population frequency^10–12, however much mutational heterogeneity exists rendering this subset of genes incomplete for the diagnosis of patients. Molecular subtyping of pancreatic tumors using mutational-based data¹¹ or gene expression signatures^13–15 have not yet seen clinical applicability. Other forms of epigenetic data have focused on chromatin-based post-translation modifications and the methylation status of cytosine based in DNA.

The control of DNA state and chromatin regulation have been observed to underpin the onset and progression of oncologic disease^16,17. DNA methylation status of cytosine bases has been shown to associate with transcriptional regulation of gene expression. DNA methylation in promoters tends to associate with gene silencing whereas demethylation is associated with gene activation¹⁸. More recently, detailed understanding of demethylation has been enabled with precision around intermediate states during demethylation activation^19,20. Specifically the oxidation of these methyl group via TET enzymes to 5-hydroxymethyl cytosine (5hmC) have yielded novel signatures that have enable the definition of cellular states ²¹, as well as the identification of cancer in the cell free state^22–24.

Previously, molecular signatures have been shown in circulating cell free DNA (cfDNA) based on 5-hydroxymethylation that may define the tissue of tumor origin in a variety of disease types²². Therefore, we embarked on a case-control study aimed at investigating whether DNA 5hmC signatures were present in the blood of patients with PDAC compared to a cohort of non-disease individuals. We also investigated whether these signatures enabled the discrimination between cancer and non-cancer patients.

We found that in our study population, PDAC patients possess many thousands of genes with an altered hydroxymethylome compared to non-disease individuals. Furthermore, filtering to those genes with the most differentially hydroxymethylated states reveals genes that have been previously implicated in pancreas development or pancreatic cancer. This biologically significant gene set performs well in the construction of predictive models to discriminate PDAC from non-disease, suggesting that the measurement of 5hmC in cfDNA merits further investigation for the detection and classification of PDAC.

Results

Clinical cohort and Study Design

Plasma specimens from 92 subjects without or with pancreatic ductal adenocarcinoma (PDAC) were collected at multiple institutions in different geographic regions of the United States and Germany. These PDAC and non-cancer patient samples satisfied the study inclusion criteria, which included a minimum subject age of 18 years as well as confirmed pathologic diagnosis of adenocarcinoma of any subtype at the time of biopsy or surgical resection for subjects in the cancer cohort (Figure 1A). The non-cancer cohort was identified as satisfying the study inclusion criteria and patients were specifically negative for any form of cancer. Neither cohort were being treated with medication for disease at the time of blood collection, which was prior to any biopsy or surgical resection in the cancer cohort. There were no statistically significant differences in subject age or gender between the two cohorts, but there was a statistically significant greater tobacco exposure in the PDAC cohort, as expected given smoking is common risk factor pancreatic cancer.

Figure 1: Study Design and Patient Cohorts Employed.

A. Schematic depicting study cohorts employed. PDAC, n = 51, Non-cancer, n=41 and pooled non-cancer replicates were include across multiple 5hmC assay processing and sequencing batches.

B. Schematic depicting sample processing workflow incorporating alternating flowcell constructs according to subject sex for detection of sample swaps.

Sequencing results and metrics

Filtering criteria to enable the determination of high quality 5hmC libraries were established yielding 51 PDAC and 41 non-cancer subjects. These criteria were established from previous studies²² and extensive analysis did not reveal batch processing effects occurring specifically in either study cohorts. PDAC samples that were exclusively either male or female were combined with non-cancer samples that were exclusively either female or male in a ratio of 2:1 respectively following a block randomization scheme. This generated two batch types for the study (Figure 1B) and enabled the identification of sample swaps, none of which were detected. A single pooled non-cancer sample (30 non-cancer donor were combined into one plasma pool from which cfDNA was isolated) served as a technical/process control for the each of the batches in the study. 5hmC enrichment libraries were sequenced to produce a median number of unique read pairs of 9.1 and 10.7 million in the PDAC and non-cancer cohorts respectively.

Distributions of 5hmC densities into functional regions in PDAC and non-cancer cohort

To gain an understanding of the functional genomic regions possibly regulated by hydroxymethylation, we first determined 5hmC enriched loci, as measured by increased read density and detection as peaks by MACS2. The vast majority of 5hmC loci occur on average in non-coding regions of the genome (intronic, transposon repeats – SINES and LINEs, and intergenic - Figure 2A) with no preferential distribution in the PDAC or non-cancer cohort. Despite the high frequency of 5hmC occurrence, these functional regions exhibit low enrichment (intron, Figure 2B) or even depletion of 5hmC sites (intergenic and LINE elements, Figure 2B) relative to the genome background. Instead enrichment occurs in promoters, UTRs, exons, transcription termination sites (TTS) and SINE elements, as measured by increased relative fold change compared to the genome background. Significant differences in enrichment of 5hmC peaks over functional regions were observed in a disease cohort specific manner. Increases in enrichment in PDAC were measured in exons, 3’UTR and TTS whereas decreases were found in promoter and LINEs, which themselves were either enriched or depleted respectively (Figure 2C). These global changes found to occur in a statistically significant manner in each cohort were also found to occur in a cancer stage specific manner, with gradual increases (exon, 3’UTR and TTS) or decreases (promoter and LINE) in later stage patients (Figure 2D).

Figure 2. Differential enrichment of 5hmC in functional genomic regions in PDAC compared with non-cancer (NC) samples.

A. 5hmC peak distribution in genomic features, note non-coding feature have larger number of peaks.

B. Enrichment analysis (Y-axis = log2 (PDAC/non-cancer)) shows that gene-based features, SINEs and Alus are enriched in 5hmC peaks in both PDAC and non-cancer cohorts. Intergenic, LINEs and L1s are depleted of 5hmC peaks.

C. Box plots depicting statistically significant changes of 5hmC peaks in promoter and LINE elements in pancreas cancer samples (decrease accumulation). Exons, 3’UTR and translation termination sites are enriched in cancer samples. Y-axis = log2 (PDAC/non-cancer)

D. Box plots depicting statistically significant changes of 5hmC peaks in functional regions across pancreatic cancer stages.

E. Box plots depicting 5hmC peak depletion in H3K4me3 and H3K27ac histone marks in the PDAC cohort (top panel) and ongoing H3K4me3 depletion observed in later stage disease (bottom panel).

F. 5hmC occupancy in the PANC- 1 cell line and normal pancreas histone map depicting variable occupancy in H3K4Me3 with depletion at the center of the mark and complementary increase in 5hmC in H3K4Me1. This supports increasing gene transcription preferentially in the PDAC cohort. Y-axis = normalized density of 5hmC counts in 10 bp windows. Dotted red lines = PDAC patient, one per line, dotted blue lines = non-cancer patients, one per line. Solid red line = average density of normalize 5hmC counts across all PDAC patients, solid blue line = average density of normalize 5hmC counts across all non-cancer patients.

Next, we investigated 5hmC occupancy, and its associated changes in PDAC, with respect to chromatin state. Post-translational modifications such as methylation and acetylation on histone proteins were inferred in relation to 5hmC occupancy, using the existing histone maps from the pancreatic cancer cell line, PANC-1, for which epigenetic data were made available by ENCODE²⁵. Notably, reduced overlap with 5hmC was observed in PDAC coincident with loci associated with H3K4me3 and H3K27ac, both of which mark transcriptionally active states (Figure 2E). There was no significant difference in global 5hmC overlap observed between PDAC and non-cancer cohorts in H3K4me1-associated loci, which mark enhancer regions (Figure 2E). Furthermore, 5hmC in H3K4me3 associated loci are significantly reduced with progressive disease as observed when the PDAC cohort is subdivided into disease stages (p=0.0003 – Figure 2E bottom right panel). Conversely, there were no statistically significant changes detected in 5hmC globally over H3K4me1 and H3K27ac associated loci with progressive disease (Figure 2E – bottom panels). Amongst the three histone maps from PANC-1 cell line, H3K4me1 has the most abundant overlap with 5hmC occupancy in both the PDAC and non-cancer cohort (Supplementary Figure 1). The PDAC samples have an increased 5hmC intensity over H3K4me1- associated sites compared to the non-cancer cohort (Figure 2F – bottom panel). Conversely, H3K4me3 loci exhibit the lowest 5hmC intensity in both cohorts (Figure 2F – top panel) and the least abundant overlap with 5hmC occupancy (Supplementary Figure 1).

Identification of disease specific genes from plasma samples

Differential analysis of 5hmC densities in genes revealed 6,496 and 6,684 genes with an increased and decreased 5hmC density respectively in PDAC compared to non-cancer samples (Figure 3A – adjusted p-value < 0.05). Further filtering of this gene set (fold change ≥ 1.5 in PDAC versus non-cancer and average log 2 CPM ≥ 4 counts, 142 genes total) revealed annotated genes with increased 5hmC density and whose biology is related to pancreas development (GATA4²⁶, GATA6²⁶, PROX1²⁷, ONECUT1²⁸) and/or implicated in cancer (YAP1²⁹, TEAD1²⁹, PROX1³⁰, ONECUT2/ONECUT1, IGF1 and IGF2). Inspection of the MSigDB for relevant pathways comprising the 142 genes with enriched 5hmC densities revealed a preponderance of pathways down-regulated in liver cancer (5 of the top 10 most significant pathways – Table 2). The differential representation analysis coupled with filtering (fold change ≤ 1.5 in PDAC versus non-cancer and log CPM of 5hmC ≥ 4) also revealed 178 genes with a decreased 5hmC density in pancreas cancer cfDNA. Closer inspection of these pathways with decreased 5hmc representation revealed fundamental pathways in immune system regulation (3 of the top 10 most significant pathways – Table 3).

View this table:

Table 1.

Clinical Characteristics of Non-Cancer and Cancer Subject Cohorts.

View this table:

Table 2.

Top 10 pathways represented by 142 genes with increased 5hmC density in PDAC samples versus non-cancer samples.

View this table:

Table 3.

Top 10 pathways represented by 178 genes with decreased 5hmC density in PDAC samples versus non-cancer samples.

Figure 3. Identification of statistical significant 5hmC changes in genes in the PDAC cohort, biological significant of the gene set and ability to partition between PDAC and non-cancer samples using 5hmC counts.

A. MA-Plot showing all differential represented genes and heatmap showing 5hmC representation on the most significant genes. Adjusted p-value < 0.05, NC = non-cancer cohort. Red points mark genes with increased 5hmC density in PDAC versus NC. Green points mark genes with decreased 5hmC density in PDAC versus NC. Red and Green are significant at the adjusted p-value < 0.05.

B. GSEA using differentially 5hmC enriched genes reveals >20% KEGG pathways are both up and down-represented via 5hmC changes in pancreas cancer versus non-cancer samples. Also >30% immune pathways are down-represented in pancreas cancer versus non-cancer samples. Hallmark =, C2 = Curated gene sets inclusive of Biocarta, KEGG and Reactome, C5.BP = GO Biological processes, C6 = Oncogenic signatures, C7 = Immunologic signatures.

C. MDS using log (counts per million) of 13,180 genes with statistically significant (FDR= 0.05) increase or decrease in 5hmC. Note reasonable partitioning of PDAC from non-cancer samples.

D. PCA using log (counts per million) of 320 genes with statistically significant (FDR= 0.05) and filtered for increased PDAC representation (|log2(5hmCPDAC/5hmCnon-cancer)| >= 0.58 and log2(average representation) >= 4) increase or decrease in 5hmC. Note reasonable partitioning of PDAC from non-cancer samples despite an order of magnitude smaller gene set than Figure 3 C.

E. Heatmaps depicting hierarchical clustering results employing 320 genes (rows in heatmaps) to show how labelled samples (columns in heatmap) can be partitioned using log(CPM) 5hmC counts. Note almost perfect partitioning of the Song et al data set (top heatmap) versus incomplete partitioning of the Li et al data set (bottom heatmap).

Expanding gene set enrichment analysis to include the full data set of all genes revealed that more than 30% of immune related pathways have a reduced 5-hydroxymethylation across early and late stage PDAC (Figure 3 B, Table 4). Multidimensional scaling analysis (MDS) using either the 13,180 genes with high variation in 5hmC counts (Figure 3 C) or the 320 genes filtered at the extremes of 5hmC representation in PDAC (Figure 3 D), reveal partitioning of the PDAC samples from the non-cancer equally well, using statistically filtered genes that form a biologically relevant set. Furthermore, the 320 genes were employed in a hierarchical clustering analysis, which enabled better partitioning of the pancreas and healthy 5hmC data from Song et al²² compared with similar data from Li et al²⁴ (Figure 3E). In summary, we have been able to find a differentially represented gene set whose biological functions are congruent with both pancreatic development and cancer more broadly and the hydroxymethylation densities of these genes alone enable the partitioning of PDAC from non-cancer.

View this table:

Table 4.

MSigDB pathways containing genes with modification in 5hmC in PDAC. Down = number of pathways with genes that have reduced 5hmC in PDAC, Up = number of pathways with genes that have increased 5hmC in PDAC, [Up, Down]/Total.pathway = Down and Up values expressed as a ratio. Hallmark =, C2 = Curated gene sets inclusive of Biocarta, KEGG and Reactome, C5.BP = GO Biological processes, C6 = Oncogenic signatures, C7 = Immunologic signatures. Note that largest magnitude of change in the most gene rich set is a decreased 5hmC in immunologic genes.

Predictive models for the detection of pancreatic cancer in cfDNA

We performed regularized logistic regression analysis in order to determine whether gene-based features are present in the PDAC and non-cancer cohorts that enable the classification of patient samples. The full set of 92 patient samples were partitioned into a training and test set comprising 75% and 25% of the patient data respectively and 65% of the genes with the most variable 5hmC count were employed for model selection. Two methods of regularization were employed, elastic net (glmnet) and lasso (glmnet2)³¹. Other modeling approaches were explored such as random forest, support vector machines and neural nets in a preliminary analysis and were found to have inferior performance on the training data.

Both regularization methods require specifying hyper-parameters which control the level of regularization used in the fit. These hyper-parameters were selected based on out-of-fold performance on 30 repetitions of 10-fold cross-validated analysis of the training data. Out-of-fold assessments are based on the samples in the left-out fold at each step of the cross-validated analysis. The training set yielded an out-of-fold performance metric, Area Under Curve (AUC), of 0.96 (elastic net and lasso) with an internal sample test AUC of 0.84 (elastic net) and 0.88 (lasso) (Fig 4B). The distribution of probability scores indicated that within the training data both models classify well (Figure 4B) but that improved robustness and stability of scoring was found with the elastic net model as evidenced by reduced variation in probability scores observed during repeated cross-validations. Next, the training model was tested on two external validation set of patient samples. These include pancreatic cancer and healthy samples from Li et al²⁴ (pancreas subtype was not specified, 23 pancreas, 53 healthy) and Song et al²² (pancreas subtype specified as adenocarcinoma, 7 pancreas, 10 healthy – Supplemental Figure 2). The validation set exhibited a performance with AUC = 0.78 (elastic net and lasso) in the Li et al data and AUC = 0.99 (elastic net) and 0.97 (lasso) in the Song et al data (Figure 4C).

Figure 4: Identification of a 5hmC signature that differentiates PDAC from non-cancer samples.

The effect of feature selection on prediction performance was evaluated by filtering the initial set of significant genes (Figure 3B) to satisfy a 1.5 fold differential 5hmC representation in the PDAC cohort with median representation of gene counts of log2 average 5hmC representation > 4. This filtering approach was applied on 75% sample data, reserving the remaining 25% for subsequent testing (see below). The same regularized regression models were built using this set of 287 genes with increased 5hmC and 343 genes with decreased 5hmC density, employing a similar setup for training and testing as defined previously and found training set AUC = 0.96 (elastic net) and 0.94 (lasso). Not surprisingly, internal testing yielded a high performance with AUC = 0.92 (elastic net) and 0.93 (lasso). Of greater interest, was the performance on external data sets with AUC = 0.74 (elastic net) and 0.67 (lasso) for Li et al data and AUC = 0.97 (elastic net) and 0.94 (lasso) for Song et al data. This suggests that statistically filtered genes that are biologically relevant to pancreatic cancer and/or pancreas development do not perform much better than an algorithmically driven selection of features during regression training, as has been shown elsewhere³².

The final models fitted to the 65% most variable 5hmC gene features in the training set, using hyper-parameter values determined from the training set data analysis, were fitted to the whole cohort of PDAC and non-cancer samples and this yielded models with 109 genes (elastic net) and 47 genes (lasso). The genes in the models were found to possess PDAC versus non-cancer t-scores that are concordant with both the Li et al and Song et al data sets (Figure 5).

Figure 5.

Comparison of t-scores of 5hmC density fold difference between PDAC and non-cancer (NC) cohorts as found in (A) Song et al and (B) Li et al, each compared to this study. All genes score are represented in grey, elastic net model genes in green and lasso model genes in red. The size of each green and red dot represents the relative contribution of that gene in the model.

Discussion

This study was focused on the discovery of cfDNA specific hydroxymethylation-based biomarkers that may facilitate the development of molecular diagnostic tests to detect pancreatic cancer at earlier stages. Our data highlight the ability to detect differentially hydroxymethylated genes whose underlying biology shows association with both pancreas and cancer development as well as established trends in chromatin mark maps and other functional regions of the genome. Furthermore, regularized regression methods were used to build models from (i) statistically filtered genes that form a biologically relevant set and (ii) a comprehensive gene set found to be highly variable, and this yielded models with AUC = 0.94- 0.96 with an external data set validation AUC = 0.74-0.97 (elastic net models).

The 5hmC signal was readily found to be enriched in gene-centric sequence types (promoter, exons, UTR and TTS), as well as transposable elements like SINEs (enriched) and LINEs (depleted) (Figure 2A, B). Such hydroxymethylcytosine changes in functional regions have been reported in cfDNA from colorectal²⁴, esophageal^23,33 and lung cancer²³. In a similar manner, PDAC specific gains or losses in hydroxymethylation were observed in functional regions in our data. In addition to enrichment and depletion of 5hmC in functional regions, there was a novel PDAC specific 5hmC increase in exons, TTS and 3’UTR and a 5hmC decrease in promoters and LINE elements (Figure 2 C). In embryonic stem cells, 5-hydromethylation decreases in the promoter region have been shown to associate with gene transcription³⁴. An increase in disease relevant transcription may be implicitly supported in our PDAC data by the 5hmC increase in gene-centric features mentioned earlier, as well as an apparent decreasing trend of 5hmC in promoter regions toward late stage PDAC (Figure 2 D).

Dynamic changes in chromatin have been shown to control cell development and transition of cells with oncogenic potential³⁵. The PDAC specific decrease of 5hmC in H3K4me3 loci appear to be coincident with a non-statistically significant increase of 5hmC in H3K4me1 (Figure 2 E). These DNA hydroxymethylation patterns appear to complement each other in genomic location and also the histone marks they occupy (Figure 2 F) and also suggest disease specific increases in gene transcription via chromatin modifications, given the known permissive transcriptional function associated with H3K4me3/me1³⁶. Precise 5hmC patterning around known functional elements of the genome suggests a broader function for hydroxymethylation in the epigenetic control of transcriptional processes. Additional work will reveal the extent to which models predictive of PDAC can be built from a combination of gene-specific, chromatin mark and transposable elements detected in cfDNA.

In this study, we employed coarse resolution of hydroxymethylation at the gene-based level in PDAC and yet were able to find genes whose increased 5hmC signals in highlighted pathways implicated in liver cancer (Table 2). We note that MSigDB does not currently contain pathways annotated for pancreatic cancer³⁷ and further that pancreas typically has groups of expressed genes shared with liver and salivary gland (https://www.proteinatlas.org/humanproteome/pancreas). We employed two approaches for gene set enrichment analysis, either using genes with differentially decreased 5hmC or via performing GSEA on all reporting genes, and found close to one third of immune system pathways were implicated. Assuming the strong association between 5hmC density and gene transcription, one interpretation of this result is that immune system function is decreased in PDAC patients. Inspection of individual genes that were either significantly increased or decreased in 5hmC density reveals genes implicated in normal pancreas development, for instance the transcription factors GATA4, GATA6, PROX1, ONECUT1/2, and also genes whose increased expression is implicated in cancer like YAP1, TEAD, PROX1, ONECUT2, ONECUT1, IGF1 and IGF2. The impact of increased relative 5hmC representation of transcription factor genes like GATA4, GATA6, PROX1, ONECUT1/2 in PDAC patient, whose involvement in early pancreatic development, suggest a reversion to a stem-like state. This is further supported by the fact that some of the significant genes with increased 5hmC representation identify a stem cell pathway (BOQUEST_STEM_CELL_UP, in top 20 mSigDB pathways from 142 genes with increased 5hmC in PDAC).

Identifying genes whose 5hmC densities are significantly changed in PDAC, leads to an enrichment of genes with annotated relevant biology which can be used to build regularized regression models, whose performance matched models built on the more comprehensive set of variable genes. This gives us good confidence that our models, whose performance is high (training AUC = 0.94-0.96 with an external data set validation AUC = 0.74-0.97), are measuring underlying biological signals relevant to PDAC. Our current external data set performance may be somewhat explained by the small sample size of external data sets (Li et al, 23 pancreas, 53 healthy and Song et al, 7 pancreas, 10 healthy). Also, whilst our external validation AUC on Li et al data was generally lower than for Song et al data, we note that Li et al pancreatic data were distributed with a mode around stage 3 disease versus a mode at stage 2 for this study (Song et al was broadly distributed across all stages – Supplementary Figure 2), thus our predictive model may be better suited for the detection of earlier versus later stage disease. Other patient characteristics (such as histological subtype and smoking, etc.) may have also differed in these independent study sets.

Despite the large number of differentially hydroxymethylated genes the regularized regression models included 100 genes or less. However the fact that 13,180 differentially hydroxymethylated genes were detected suggest that other biological signals may also reside in our data set. Smoking status is a known risk factor for PDAC up to 20 years post smoking cessation and DNA methylation changes have been associated with tobacco-based toxins³⁸. In our retrospective case-control designed study, ever smokers constituted 59% and 49% of PDAC and non-cancer cohorts respectively, indicating that ever smokers are well represented in each cohort. Consequently, we do not suspect that smoking association in our PDAC cohort could account for the significantly hydroxymethylated genes found. However, a more extensive future study focused on sub-partitioning PDAC and non-cancer patient into never and ever smokers with pack-year characteristics will enable us to address the impact of smoking on the hydroxymethylome in PDAC patients. These pancreatic cancer risk parameters combined into a clinical relevant, intent to test population based study, will further our current set of findings beyond our current case-control cohort study, which numbers less than 100 participants. Further consideration of disease-related clinical parameters will enable us to explore hydroxymethylcytosine features with the aim of yielding refined signals capable of earlier diagnosis of PDAC.

Methods

Clinical cohorts and study design

A case-control study was performed using plasma obtained from subjects without (termed non-cancer) and with pancreatic cancer who provided informed consent and contributed biospecimens in studies approved by the Institutional Review Boards (IRBs) at participating sites in the United States and Germany. Plasma samples for the non- cancer cohort were obtained from subjects enrolled prospectively at five sites in the United States, following review and approval of the study protocol by each site’s participating investigator(s).

Cancer cohort

Plasma samples for the cancer cohort were obtained from subjects who had undergone management for pancreatic cancer in the United States or Germany, and also provided consent for use of blood specimens for archival storage and retrospective analyses. Criteria for subject eligibility for inclusion in the analysis included age greater than or equal to 21 years for all subjects, with additional requirements for the cancer cohort including: 1) no cancer treatment, e.g., surgical, chemotherapy, immunotherapy, targeted therapy, or radiation therapy, prior to study enrollment and blood specimen acquisition; and 2) a confirmed pathologic diagnosis of adenocarcinoma inclusive of all subtypes.

Non-cancer cohort

Subject exclusion criteria for the non-cancer cohort also included any of the following: prior cancer diagnosis within prior six months; surgery or invasive procedure requiring general anesthesia within prior month; non-cancer systemic therapy associated with molecularly targeted immune modulation; concurrent or prior pregnancy within previous 12 months; history of organ tissue transplantation; history of blood product transfusion within one month; and major trauma within six months. Clinical data required for all subjects included age, gender, smoking history, and both tissue pathology and grade, and were managed in accordance with the guidance established by the Health Insurance Portability and Accountability Act (HIPAA) of 1996 to ensure subject privacy.

Plasma collection

Plasma was isolated from whole blood specimens obtained by routine venous phlebotomy at the time of subject enrollment. For cancer subjects, whole blood was collected in K3EDTA tubes (Sarstedt, Nümbrecht, Germany) with isolation of plasma within 4 h of phlebotomy by centrifugation at 1,500g for 10 min at RT, followed by transfer of the plasma layer to a new tube for centrifugation at 3,000g for 10 min at RT, with plasma aliquots used for isolation of cell-free DNA (cfDNA) or stored at −80°C.

For non-cancer subjects, whole blood was collected in Cell-Free DNA BCT^® tubes according to the manufacturer’s protocol (Streck, La Vista, NE) (https://www.streck.com/collection/cell-free-dna-bct/). Tubes were maintained at 15 °C to 25 °C with plasma separation performed within 24 h of phlebotomy by centrifugation of whole blood at 1600 x g for 10 min at RT, followed by transfer of the plasma layer to a new tube for centrifugation at 16,000 x g for 10 min. Plasma was aliquoted for subsequent cfDNA isolation or storage at −80°C.

cfDNA isolation

cfDNA was isolated using the QIAamp Circulating Nucleic Acid Kit (QIAGEN, Germantown, MD) following the manufacturer’s protocol excepting the omission of carrier RNA during cfDNA extraction. Two milliliter plasma volumes (cancer) or four milliliter plasma volumes (non-cancer) were lysed for 30 minutes prior to collection of nucleic acids; all cfDNA eluates were collected in a volume of 60 µl buffer. All cfDNA eluates were quantified by Bioanalyzer dsDNA High Sensitivity assay (Agilent Technologies Inc, Santa Clara, CA) and Qubit dsDNA High Sensitivity Assay (Thermo Fisher Scientific, Waltham, MA) was employed to ensure the absence of contaminating high molecular weight DNA emanating from white blood cell lysis.

5-hydroxymethyl Cytosine (5hmC) assay enrichment

Sequencing library preparation and 5hmC enrichment was performed as described previously (Song et al). cfDNA was normalized to 10 ng total input for each assay and ligated to sequencing adapters. 5hmC bases were biotinylated via a two-step chemistry and subsequently enriched by binding to Dynabeads M270 Streptavidin (Thermo Fisher Scientific, Waltham, MA). All libraries were quantified by Bioanalyzer dsDNA High Sensitivity assay (Agilent Technologies Inc, Santa Clara, CA) and Qubit dsDNA High Sensitivity Assay (Thermo Fisher Scientific, Waltham, MA) and normalized in preparation for sequencing.

DNA sequencing and alignment

DNA sequencing was performed according to manufacturer’s recommendations with 75 base-pair, paired-end sequencing using a NextSeq550 instrument with version 2 reagent chemistry (Illumina, San Diego, CA). Twenty four libraries were sequenced per flowcell and raw data processing and demultiplexing was performed using the Illumina BaseSpace Sequence Hub to generate sample-specific FASTQ output. Sequencing reads were aligned to the hg19 reference genome using BWA-MEM with default parameters³⁹.

Peak Detection

BWA-MEM read alignments were employed to identified regions or peaks of dense read accumulation that mark the location of a hydroxymethylated cytosine residue in a CpG content. Prior to identified peaks BAM files containing the locations of aligned reads were filtered for poorly mapped (MAPQ < 30) and not properly paired reads. 5hmC peak calling was carried out using MACS2 (https://github.com/taoliu/MACS) with a p-value cut off = 1.00e-5. Identified 5hmC peaks residing in “blacklist regions” as defined elsewhere (https://sites.google.com/site/anshulkundaje/projects/blacklists) and read date on chromosomes X, Y and mitochondrial genome were also removed. Computation of genomic feature enrichment overlap 5hmC peaks were performed using the HOMER software (http://homer.ucsd.edu/homer/) with default parameters.

Chromatin modifications (H3K4me1, H3K4me3 and H3K27ac) were identified in histone maps of the pancreatic cancer cell line PANC-1 and were downloaded from ENCODE ChIP-Seq repository (https://genome.ucsc.edu/encode/dataMatrix/encodeChipMatrixHuman.html Determination of enrichment were calculated via Odds ratio using the Fisher Exact Test via the program bedtools fisher. For comparisons between PDAC and non-cancer, the Wilcoxon test was used, and for across stages comparison, the Kruskal-Wallis Test was employed.

Differential Representation Analysis

For the purpose of reliably identifying gene bodies with differential representation between the PDAC and the non-cancer groups, we closely followed the RNA-Seq workflow outlined in Law et al. 2016⁴⁰, including much of the preliminary QC steps. In brief, the analysis includes data pre-processing by adopting the following workflow: (i) transforming the data from raw counts to log2(counts per million), (ii) removing genes that are weakly represented, (iii) normalizing the gene representation distributions, and (iv) performing unsupervised clustering of samples. To accomplish differential representation analysis, we applied the following steps: (i) creating a design matrix and PDAC vs non-cancer cohorts, (ii) removing heteroscedascity from the data, fitting the linear models for the comparison of interest, PDAC vs non-cancer, (iv) examining the number of differentially represented genes.

In most of these analysis steps the default settings were used when appropriate. To remove weakly represented genes, we excluded genes that did not have greater than 3 counts per million reads in at least 20 samples. This filter excludes roughly 12% of the genes. For the identification of the significantly differentially represented regions we used the method of Benjamini and Hochberg⁴¹ to obtain p-values adjusted for multiple comparisons. In this report we use adjusted p-value and false discovery rate (FDR) interchangeably.

Predictive Modelling

For the purpose of assessing the feasibility of building classifiers that can discriminate between PDAC and non-cancer samples based on the 5hmC representation of gene bodies, we evaluated to performance of two forms of regularized logistic regression models commonly used in the classification context, where the number of examples are few and the number of features large; the lasso and the elastic net. See Friedman et al. (2010)³¹ for a description of the general elastic net precedure of which the lasso is a special case. Software implementation of these methods can be found at https://cran.r-project.org/web/packages/glmnet/index.html. Weakly represented genes were excluded from analysis as described in the section on Differential Representation Analysis.

All training and fitting was done on 75% of the samples selected at random in a balanced way to keep the ratio of the number of PDAC to non-cancer samples similar in both the training and testing subsets. Before any fitting, genes were filtered to include the 65% most variable genes for the model fitting task. The filter was designed using the training samples only and was done in a way to ensure that genes of all levels of representation were included.

Both regularization methods assessed, the lasso and the elastic net, require specifying hyper- parameters which control the level of regularization used in the fit. Hyper-parameters were selected based on out-of-fold performance on 30 repetitions of 10-fold cross-validated analysis of the training data. Out-of-fold assessments are based on the samples in the left-out fold at each step of the cross-validated analysis. The out-of-fold performance of the models fitted with hyper-parameter values set at the optimal values might yield a slightly optimistic assessment of performance. The performance of these models applied to the test set should provide less biased estimate of performance, although generalizability to external datasets is not always guaranteed.

The hyper-parameter values that lead to the best out-of-fold performance were then used to fit the final models which were fitted to the entire set of samples including both training and testing subsets. The performance of these final models can thus only be evaluated based on their performance on external data sets. These do provide a sense of the generalizability of the performance observed in the local training and testing data sets.

To evaluate the effect of feature selection on prediction performance, we repeated the training and evaluation task based on a filtered set of genes that included genes found to be significantly differentially represented, having a 1.5 fold differential 5hmC representation, and a level of representation exceeding the median level (log2 CPM ≥ 4). This filter was designed based on training data statistics only.

References

1.↵
Institute, N. in SEER Cancer Stat Facts: Pancreas Cancer. (2017).
2.↵
Kleeff, J. et al. Pancreatic cancer. Nature Reviews Disease Primers 2, rdp201622 (2016).
OpenUrl
3.↵
Tempero, M. A. et al. Pancreatic Adenocarcinoma, Version 2.2017, NCCN Clinical Practice Guidelines in Oncology. Journal of the National Comprehensive Cancer Network: JNCCN 15, 1028–1061 (2017).
OpenUrl
4.↵
Anderson, M. et al. Alcohol and tobacco lower the age of presentation in sporadic pancreatic cancer in a dose-dependent manner: a multicenter study. Am J Gastroenterol 107, 1730–9
5.↵
Maisonneuve, P. & Lowenfels, A. Epidemiology of pancreatic cancer: an update. Dig Dis 28, 645–56 (2010).
OpenUrl CrossRef PubMed Web of Science
6.↵
Swords, D., Firpo, M., Scaife, C. & Mulvihill, S. Biomarkers in pancreatic adenocarcinoma: current perspectives. Onco Targets Ther 9, 7459–7467 (2016).
OpenUrl
7.↵
Fonseca, A., Kirkwood, K., Kim, M., Maitra, A. & Koay, E. Intraductal Papillary Mucinous Neoplasms of the Pancreas: Current Understanding and Future Directions for Stratification of Malignancy Risk. Pancreas 47, 272–279
8.↵
Elta, G., Enestvedt, B., Sauer, B. & Lennon, A. ACG Clinical Guideline: Diagnosis and Management of Pancreatic Cysts. Am J Gastroenterol 113, 464–479
9.↵
Lal, G. et al. Inherited predisposition to pancreatic adenocarcinoma: role of family history and germ-line p16, BRCA1, and BRCA2 mutations. Cancer Res 60, 409–16
10.↵
Biankin, A. V. et al. Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes. Nature 491, 399–405
11.↵
Waddell, N. et al. Whole genomes redefine the mutational landscape of pancreatic cancer. Nature 518, 495–501 (2015).
OpenUrl CrossRef PubMed
12.↵
Jones, S. et al. Core Signaling Pathways in Human Pancreatic Cancers Revealed by Global Genomic Analyses. Science 321, 1801–1806 (2008).
OpenUrl Abstract/FREE Full Text
13.↵
Bailey, P. et al. Genomic analyses identify molecular subtypes of pancreatic cancer. Nature 531, 47–52
14.
Collisson, E. A. et al. Subtypes of pancreatic ductal adenocarcinoma and their differing responses to therapy. Nature medicine 17, 500 (2011).
OpenUrl CrossRef PubMed
15.↵
Moffitt, R. A. et al. Virtual microdissection identifies distinct tumor- and stroma-specific subtypes of pancreatic ductal adenocarcinoma. Nature genetics 47, 1168–78 (2015).
OpenUrl CrossRef PubMed
16.↵
Marazzi, I., Greenbaum, B. D., Low, D. H. & Guccione, E. Chromatin dependencies in cancer and inflammation. Nature Reviews Molecular Cell Biology 19, 245 (2017).
OpenUrl
17.↵
Feinberg, A. Epigenetic stochasticity, nuclear structure and cancer: The implications for medicine. Journal of Internal Medicine 276, 5–11 (2014).
OpenUrl
18.↵
Curradi, M., Izzo, A., Badaracco, G. & Landsberger, N. Molecular Mechanisms of Gene Silencing Mediated by DNA Methylation. Molecular and Cellular Biology 22, 3157–3173 (2002).
OpenUrl Abstract/FREE Full Text
19.↵
Kriaucionis, S. & Heintz, N. The nuclear DNA base 5-hydroxymethylcytosine is present in Purkinje neurons and the brain. Science (New York, N.Y.) 324, 929–30 (2009).
OpenUrl
20.↵
Tahiliani, M. et al. Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner TET1. Science (New York, N.Y.) 324, 930–5 (2009).
OpenUrl
21.↵
Nestor, C. E. et al. 5-Hydroxymethylcytosine Remodeling Precedes Lineage Specification during Differentiation of Human CD4+ T Cells. Cell Reports 16, 559–570
22.↵
Song, C.-X. et al. 5-Hydroxymethylcytosine signatures in cell-free DNA provide information about tumor types and stages. Cell Res 27, 1231 (2017).
OpenUrl
23.↵
Zhang, J. et al. 5-Hydroxymethylome in Circulating Cell-free DNA as A Potential Biomarker for Non-small-cell Lung Cancer. Genomics, proteomics & bioinformatics 16, 187–199 (2018).
OpenUrl
24.↵
Li, W. et al. 5-Hydroxymethylcytosine signatures in circulating cell-free DNA as diagnostic biomarkers for human cancers. Cell Research 1–15 doi:10.1038/cr.2017.121
OpenUrl CrossRef
25.↵
Elements, D. An integrated encyclopedia of DNA elements in the human genome. doi:10.1038/nature11247
OpenUrl CrossRef PubMed Web of Science
26.↵
Huang, L. et al. Ductal pancreatic cancer modeling and drug screening using human pluripotent stem cell- and patient-derived tumor organoids. Nature medicine 21, 1364–71 (2015).
OpenUrl CrossRef PubMed
27.↵
Burke, Z. & Oliver, G. Prox1 is an early specific marker for the developing liver and pancreas in the mammalian foregut endoderm. Mechanisms of Development 118, 147–155 (2002).
OpenUrl CrossRef PubMed Web of Science
28.↵
Zhang, H. et al. Multiple, temporal-specific roles for HNF6 in pancreatic endocrine and ductal differentiation. Mechanisms of development 126, 958–73 (2009).
OpenUrl CrossRef PubMed
29.↵
Rozengurt, E., Sinnett-Smith, J. & Eibl, G. Yes-associated protein (YAP) in pancreatic cancer: at the epicenter of a targetable signaling network associated with patient survival. Signal Transduction and Targeted Therapy 3, 11 (2018).
OpenUrl
30.↵
Saukkonen, K. et al. PROX1 and β-catenin are prognostic markers in pancreatic ductal adenocarcinoma. BMC Cancer 16, 472 (2016).
OpenUrl
31.↵
Friedman, J., Hastie, T. & Tibshirani, R. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of statistical software 33, 1–22 (2010).
OpenUrl CrossRef
32.↵
Haury, A.-C. C., Gestraud, P. & Vert, J.-P. P. The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. PloS one 6, e28210 (2011).
OpenUrl CrossRef PubMed
33.↵
Tian, X. et al. Circulating tumor DNA 5-hydroxymethylcytosine as a novel diagnostic biomarker for esophageal cancer. Cell Research 1–4 (2018). doi:10.1038/s41422-018-0014-x
OpenUrl CrossRef
34.↵
Szulwach, K. E. et al. Integrating 5-hydroxymethylcytosine into the epigenomic landscape of human embryonic stem cells. PLoS Genetics 7,
35.↵
Morgan, M. A. & Shilatifard, A. Chromatin signatures of cancer. Genes & Development 29, 238–249 (2015).
OpenUrl Abstract/FREE Full Text
36.↵
Calo, E. & Wysocka, J. Modification of Enhancer Chromatin: What, How, and Why? Molecular Cell 49, 825–837 (2013).
OpenUrl CrossRef PubMed Web of Science
37.↵
Subramanian, A. et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102, 15545–15550 (2005).
OpenUrl Abstract/FREE Full Text
38.↵
Lee, K. & Pausova, Z. Cigarette smoking and DNA methylation. Front Genet 4, 132 (2013).
OpenUrl PubMed
39.↵
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).
OpenUrl CrossRef PubMed Web of Science
40.↵
Law, C. W., Alhamdoosh, M., Su, S., Smyth, G. K. & Ritchie, M. E. RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR. F1000Research 5, 1408 (2016).
OpenUrl
41.↵
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the royal statistical society. Series B (Methodological) 289–300 (1995).

View the discussion thread.

Posted September 20, 2018.

Download PDF

Supplementary Material

Citation Tools

Subject Area

Cancer Biology

Subject Areas

All Articles

Animal Behavior and Cognition (5201)
Biochemistry (11718)
Bioengineering (8724)
Bioinformatics (29132)
Biophysics (14936)
Cancer Biology (12051)
Cell Biology (17360)
Clinical Trials (138)
Developmental Biology (9406)
Ecology (14146)
Epidemiology (2067)
Evolutionary Biology (18269)
Genetics (12223)
Genomics (16768)
Immunology (11844)
Microbiology (28016)
Molecular Biology (11560)
Neuroscience (60822)
Paleontology (450)
Pathology (1864)
Pharmacology and Toxicology (3231)
Physiology (4940)
Plant Biology (10401)
Scientific Communication and Education (1680)
Synthetic Biology (2878)
Systems Biology (7333)
Zoology (1642)

[1] 1.↵
Institute, N. in SEER Cancer Stat Facts: Pancreas Cancer. (2017).

[2] 2.↵
Kleeff, J. et al. Pancreatic cancer. Nature Reviews Disease Primers 2, rdp201622 (2016).
OpenUrl

[3] 3.↵
Tempero, M. A. et al. Pancreatic Adenocarcinoma, Version 2.2017, NCCN Clinical Practice Guidelines in Oncology. Journal of the National Comprehensive Cancer Network: JNCCN 15, 1028–1061 (2017).
OpenUrl

[4] 4.↵
Anderson, M. et al. Alcohol and tobacco lower the age of presentation in sporadic pancreatic cancer in a dose-dependent manner: a multicenter study. Am J Gastroenterol 107, 1730–9

[5] 5.↵
Maisonneuve, P. & Lowenfels, A. Epidemiology of pancreatic cancer: an update. Dig Dis 28, 645–56 (2010).
OpenUrl CrossRef PubMed Web of Science

[6] 6.↵
Swords, D., Firpo, M., Scaife, C. & Mulvihill, S. Biomarkers in pancreatic adenocarcinoma: current perspectives. Onco Targets Ther 9, 7459–7467 (2016).
OpenUrl

[7] 7.↵
Fonseca, A., Kirkwood, K., Kim, M., Maitra, A. & Koay, E. Intraductal Papillary Mucinous Neoplasms of the Pancreas: Current Understanding and Future Directions for Stratification of Malignancy Risk. Pancreas 47, 272–279

[8] 8.↵
Elta, G., Enestvedt, B., Sauer, B. & Lennon, A. ACG Clinical Guideline: Diagnosis and Management of Pancreatic Cysts. Am J Gastroenterol 113, 464–479

[9] 9.↵
Lal, G. et al. Inherited predisposition to pancreatic adenocarcinoma: role of family history and germ-line p16, BRCA1, and BRCA2 mutations. Cancer Res 60, 409–16

[10] 10.↵
Biankin, A. V. et al. Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes. Nature 491, 399–405

[11] 11.↵
Waddell, N. et al. Whole genomes redefine the mutational landscape of pancreatic cancer. Nature 518, 495–501 (2015).
OpenUrl CrossRef PubMed

[12] 12.↵
Jones, S. et al. Core Signaling Pathways in Human Pancreatic Cancers Revealed by Global Genomic Analyses. Science 321, 1801–1806 (2008).
OpenUrl Abstract/FREE Full Text

[13] 13.↵
Bailey, P. et al. Genomic analyses identify molecular subtypes of pancreatic cancer. Nature 531, 47–52

[14] 14.
Collisson, E. A. et al. Subtypes of pancreatic ductal adenocarcinoma and their differing responses to therapy. Nature medicine 17, 500 (2011).
OpenUrl CrossRef PubMed

[15] 15.↵
Moffitt, R. A. et al. Virtual microdissection identifies distinct tumor- and stroma-specific subtypes of pancreatic ductal adenocarcinoma. Nature genetics 47, 1168–78 (2015).
OpenUrl CrossRef PubMed

[16] 16.↵
Marazzi, I., Greenbaum, B. D., Low, D. H. & Guccione, E. Chromatin dependencies in cancer and inflammation. Nature Reviews Molecular Cell Biology 19, 245 (2017).
OpenUrl

[17] 17.↵
Feinberg, A. Epigenetic stochasticity, nuclear structure and cancer: The implications for medicine. Journal of Internal Medicine 276, 5–11 (2014).
OpenUrl

[18] 18.↵
Curradi, M., Izzo, A., Badaracco, G. & Landsberger, N. Molecular Mechanisms of Gene Silencing Mediated by DNA Methylation. Molecular and Cellular Biology 22, 3157–3173 (2002).
OpenUrl Abstract/FREE Full Text

[19] 19.↵
Kriaucionis, S. & Heintz, N. The nuclear DNA base 5-hydroxymethylcytosine is present in Purkinje neurons and the brain. Science (New York, N.Y.) 324, 929–30 (2009).
OpenUrl

[20] 20.↵
Tahiliani, M. et al. Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner TET1. Science (New York, N.Y.) 324, 930–5 (2009).
OpenUrl

[21] 21.↵
Nestor, C. E. et al. 5-Hydroxymethylcytosine Remodeling Precedes Lineage Specification during Differentiation of Human CD4+ T Cells. Cell Reports 16, 559–570

[22] 22.↵
Song, C.-X. et al. 5-Hydroxymethylcytosine signatures in cell-free DNA provide information about tumor types and stages. Cell Res 27, 1231 (2017).
OpenUrl

[23] 23.↵
Zhang, J. et al. 5-Hydroxymethylome in Circulating Cell-free DNA as A Potential Biomarker for Non-small-cell Lung Cancer. Genomics, proteomics & bioinformatics 16, 187–199 (2018).
OpenUrl

[24] 24.↵
Li, W. et al. 5-Hydroxymethylcytosine signatures in circulating cell-free DNA as diagnostic biomarkers for human cancers. Cell Research 1–15 doi:10.1038/cr.2017.121
OpenUrl CrossRef

[25] 25.↵
Elements, D. An integrated encyclopedia of DNA elements in the human genome. doi:10.1038/nature11247
OpenUrl CrossRef PubMed Web of Science

[26] 26.↵
Huang, L. et al. Ductal pancreatic cancer modeling and drug screening using human pluripotent stem cell- and patient-derived tumor organoids. Nature medicine 21, 1364–71 (2015).
OpenUrl CrossRef PubMed

[27] 27.↵
Burke, Z. & Oliver, G. Prox1 is an early specific marker for the developing liver and pancreas in the mammalian foregut endoderm. Mechanisms of Development 118, 147–155 (2002).
OpenUrl CrossRef PubMed Web of Science

[28] 28.↵
Zhang, H. et al. Multiple, temporal-specific roles for HNF6 in pancreatic endocrine and ductal differentiation. Mechanisms of development 126, 958–73 (2009).
OpenUrl CrossRef PubMed

[29] 29.↵
Rozengurt, E., Sinnett-Smith, J. & Eibl, G. Yes-associated protein (YAP) in pancreatic cancer: at the epicenter of a targetable signaling network associated with patient survival. Signal Transduction and Targeted Therapy 3, 11 (2018).
OpenUrl

[30] 30.↵
Saukkonen, K. et al. PROX1 and β-catenin are prognostic markers in pancreatic ductal adenocarcinoma. BMC Cancer 16, 472 (2016).
OpenUrl

[31] 31.↵
Friedman, J., Hastie, T. & Tibshirani, R. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of statistical software 33, 1–22 (2010).
OpenUrl CrossRef

[32] 32.↵
Haury, A.-C. C., Gestraud, P. & Vert, J.-P. P. The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. PloS one 6, e28210 (2011).
OpenUrl CrossRef PubMed

[33] 33.↵
Tian, X. et al. Circulating tumor DNA 5-hydroxymethylcytosine as a novel diagnostic biomarker for esophageal cancer. Cell Research 1–4 (2018). doi:10.1038/s41422-018-0014-x
OpenUrl CrossRef

[34] 34.↵
Szulwach, K. E. et al. Integrating 5-hydroxymethylcytosine into the epigenomic landscape of human embryonic stem cells. PLoS Genetics 7,

[35] 35.↵
Morgan, M. A. & Shilatifard, A. Chromatin signatures of cancer. Genes & Development 29, 238–249 (2015).
OpenUrl Abstract/FREE Full Text

[36] 36.↵
Calo, E. & Wysocka, J. Modification of Enhancer Chromatin: What, How, and Why? Molecular Cell 49, 825–837 (2013).
OpenUrl CrossRef PubMed Web of Science

[37] 37.↵
Subramanian, A. et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102, 15545–15550 (2005).
OpenUrl Abstract/FREE Full Text

[38] 38.↵
Lee, K. & Pausova, Z. Cigarette smoking and DNA methylation. Front Genet 4, 132 (2013).
OpenUrl PubMed

[39] 39.↵
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).
OpenUrl CrossRef PubMed Web of Science

[40] 40.↵
Law, C. W., Alhamdoosh, M., Su, S., Smyth, G. K. & Ritchie, M. E. RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR. F1000Research 5, 1408 (2016).
OpenUrl

[41] 41.↵
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the royal statistical society. Series B (Methodological) 289–300 (1995).

Detection of early stage pancreatic cancer using 5-hydroxymethylcytosine signatures in circulating cell free DNA

Abstract

Introduction

Results

Clinical cohort and Study Design

Sequencing results and metrics

Distributions of 5hmC densities into functional regions in PDAC and non-cancer cohort

Identification of disease specific genes from plasma samples

Predictive models for the detection of pancreatic cancer in cfDNA

Discussion

Methods

Clinical cohorts and study design

Cancer cohort

Non-cancer cohort

Plasma collection

cfDNA isolation

5-hydroxymethyl Cytosine (5hmC) assay enrichment

DNA sequencing and alignment

Peak Detection

Differential Representation Analysis

Predictive Modelling

References

Citation Manager Formats

Subject Area