Abstract
The transformation biology of sAML from MDS is still not fully understood. Here, we performed a big cohort of paired sequences including target, whole-exome and single cell sequencing to search AML transformation-related last events. The results showed that fifty-five out of the 64 (85.9%) patients presented presumptive last mutation events involving activated signaling, transcription factors, or tumor suppressors. Most last mutation events (63.6%, 35 cases) emerged at the leukemia transformation point. All five of the remaining nine patients analyzed by paired whole exome sequencing showed transformation-related mutations which are not included in the reference targets. Single-cell sequencing indicated that the activated cell signaling route was related to last gene mutation events which take place prior to phenotypic development. Of note, last gene mutation events defined using target sequencing was limited to a small set of genes (less than ten, in the order: NRAS/KRAS, CEBPA, TP53, FLT3, RUNX1, CBL, PTPN11 and WT1, accounted for 91.0% of the mutations). In conclusion, somatic mutations involving in activated signaling, transcription factors, or tumor suppressors appeared to be a precondition for AML transformation from myelodysplastic syndromes. The transformation-related gene mutations may be considered as new therapy targets.
Introduction
In most cases, de novo acute myeloid leukemia (AML) shows rapid onset without an obvious pre-AML period.1 Patients have been reported to show genetic abnormalities, such as PML/RARa in the M3 subset, ETO fusion gene in the M2b subset, and CBFβ-MYH11 in M4 EO.2–4 In addition, somatic mutations involving CEBPa, FLT3, NPM1, and c-Kit have been used to support prognosis.5 On the basis of these biological characteristics, some target-specific and immunological methods have been developed to treat AML.6 Unlike de novo AML, secondary AML (sAML), transformed from myelodysplastic syndromes (MDS), shows unique genetic features. First, there is often a pre-AML stage, where somatic gene mutations result in initial events (such as clonal hematopoiesis) and/or driver events (such as development of MDS phenotypes), involving epigenetic regulation, RNA splicing, transcription, etc.7,8 MDS are malignant myeloid disorders associated with abundant gene mutations at diagnosis. Second, despite preliminary findings on the roles of late-stage gene mutations involving various signaling pathways or transcription during sAML development,9,10 the transformation biology of sAML from MDS is still not fully understood. There are many unanswered questions, such as: Do somatic mutations solely involved in epigenetics or RNA splicing could also induce sAML? Which mutations are the key factors (last events) that induce the transformation process? When do the last events emerge? Are they pre-existing at MDS diagnosis, or do they emerge when the transformation starts? We proposed that analyzing a big cohort of paired sequences would be useful to answer these questions, hypothesizing that mutations involving signaling pathway/myeloid transcription factors, or tumor suppressors is a precondition for MDS to progress to sAML. After over 10 years of following up over 1,000 patients with MDS in our department, we identified 64 patients among a group of patients who underwent AML transformations. Paired data were acquired at diagnosis and immediately after AML transformation. Sequencing for 39 target genes was first carried out, and then whole exome sequencing was performed for a few patients whose target sequencing did not yield meaningful results. At the same time, one paired samples was processed using single-cell RNA transcription sequencing. Using all the techniques mentioned above, we hoped to obtain data for preliminary analysis of sAML transformation biology, and we envisaged developing a novel strategy to block the MDS transformation.
Patients and methods
Sample recruit
From January 2004 till October 2019, a total of 1,427 patients were diagnosed with MDS in our department. Of them, over 90% of the patients were followed up by calls or with face-to-face consultations, and over 800 patients consented to providing samples for next generation sequencing analysis. DNA was extracted from the bone marrow (BM) samples obtained from some patients before 2009 and retrospectively used for target sequencing or whole exome sequencing. For patients whose BM samples were acquired in or after 2009, DNA was extracted from the samples and immediately used for targeted sequencing or whole exome sequencing. For single-cell RNA transcription sequencing, fresh BM was used to isolate mononuclear cells within six hours of BM aspiration. Targeted sequencing was performed using paired samples (at the diagnosis and transformation points) to detect transformation-related gene mutations (last events). Whole exome sequencing was performed when conclusive results for last mutation events were not obtained via targeted sequencing and when adequate residual DNA extracts were available. Single-cell RNA transcription sequencing was used to examine the underlying transformation dynamics. Diagnoses for MDS and sAML were conducted in strict accordance with the WHO criteria, and the CMML subset was diagnosed according to the FAB classification.11,12 All of the subjects provided written informed consent for genetic analysis under a protocol that was approved by the Ethics Committee of Shanghai Jiao Tong University Affiliated Sixth People’s Hospital. All methods were performed in accordance with the relevant guidelines and regulations.
Genomic DNA preparation, target enrichment, and sequencing
Genomic DNA (gDNA) was extracted using the DNeasy Blood and Tissue Kit (Qiagen, Germany) according to the manufacturer’s protocol. Genomic DNA was sheared using the Covaris® system (Covaris, USA), and the DNA sample was prepared using the Truseq DNA Sample preparation Kit (Illumina, USA) according to the manufacturer’s protocol.
For probe design, both coding and regulatory regions of target genes were included in the custom panel. The regulatory regions comprised promoter regions (defined as 2 kb upstream of the transcription start site), 5’ un-translated region (5’-UTR), and intron-exon boundaries (50 bp). Custom capture oligos were designed using SureDesign website of Agilent Technologies (Agilent, USA). Hybridization reactions were carried out on ABI 2720 Thermal Cycler (Life Technologies, USA) with the following hybridization conditions. Hybridization mixture was incubated for 16 or 24 h at 65°C with a heated lid at 105°C. After the hybridization reactions, the hybridization mixture was captured and washed with magnetic beads (Invitrogen, USA) and SureSelect target enrichment kit (Agilent, USA). The captured product was enriched with the following cycling conditions, 98°C for 30s, 10 cycles of 98°C for 10 s, 60°C for 30 s, 72°C for 30 s, and 72°C for 5 min. Library quality was assessed using an Agilent 2100 Bioanalyzer (Agilent, USA), and multiplexed sequencing was performed on HiSeq 2500 sequencers with 2×150 paired-end modules (Illumina, USA). Total sequencing depth was 200×. Following 39 sequenced target genes were included: ANKRD11, ASXL1, BCOR, CALR, CBL, CEBPA, DHX9, DNMT3A, ETV6, EZH2, FLT3, GATA2, IDH1, IDH2, ITIH3, JAK2, KIF20B, KIT, KRAS, MPL, NF1, NPM1, NRAS, PHF6, PTPN11, PTPRD, ROBO1, ROBO2, RUNX1, SETBP1, SF3B1, SRSF2, STAG2, TET2, TP53, U2AF1, UPF3A, WT1, and ZRSR2.
Whole-exome sequencing
The gDNA library was prepared using a TruSeq DNA Sample Preparation Kit (Illumina, San Diego, CA, US) in accordance with the manufacturer’s protocol. In-solution exome enrichment was performed using a TruSeq Exome Enrichment kit (Illumina) according to the manufacturer’s instructions. The enriched DNA samples were sequenced via 2×100 paired-end sequencing using a Hiseq2000 Sequencing System (Illumina). Illumina Sequencing Control v2.8, Illumina Off-Line Basecaller v1.8, and Illumina Consensus Assessment of Sequence and Variation v1.8 software were used to produce 100-base pair (bp) sequence reads.
Sequencing data processing, variant calling, and annotation
Before variant calling, the raw sequence reads were mapped to the reference genome (hg19), and then duplicate reads were marked and removed to mitigate biases introduced by data generation steps such as PCR amplification, and the base quality scores were recalibrated using the Genome Analysis Toolkit (GATK). The Ensembl VEP and vcf2maf tools were applied to generate a MAF format for somatic mutation annotation, and ANNOVAR tool was used to annotate frequency information of variations in the population database. The variants were identified as low-frequency functional mutation if they had < 0.1 frequency in ExAC03 database, < 0.01 frequency in 1,000 genome database, and < 0.05 frequency in GeneskyDB database. According to the results, the variant was then extracted as one of the following functional annotations: “Frame_Shift_Del”, “Frame_Shift_Ins”, “In_Frame_Del”, “In_Frame_Ins”, “Missense_Mutation”, “Nonsense_Mutation”, “Nonstop_Mutation”, “Splice_Site” or “Translation_Start_Site”.
Single-cell RNA sequencing and bioinformatic data analysis
BMMCs concentration was measured using hemocytometer and adjusted to 700-2,000 cells/μL. Single-cell RNA-seq library was performed using 10x Genomics Single Cell 3’ reagent v2 according to the manufacturer’s instructions. Twelve and 14 cycles were used for complementary DNA and index PCR amplifications, respectively. Amplified cDNA and library quality was assessed using Agilent 2100 Bioanalyzer (Agilent, USA). Libraries were pooled and sequenced on HiSeqX Ten (Illumina) with 2×150 paired-end modules and at least 50K mean reads per cell were generated. Raw sequencing data were processed using Cell Ranger version 3.1.0 (10x Genomics) with default parameters. Quality control metrics were used to remove cells with mitochondrial gene percentage more than 20% or cell with fewer than 200 genes detected. Variably expressed gene selection, dimensionality reduction, and clustering were performed using the Seurat package version 3.1.3. Principal component analysis was performed on significantly variable genes and the first 40 principal components were used for UMAP dimension reduction. Cell type of each cluster was identified using singleR version 1.0.1. For paired samples, differentially expressed genes (fold change > 2 and Wilcox test P value < 0.05) were identified in each of the seven cell types independently, of which the newborn mutation genes PTPN11 and NRAS were used for GO and KEGG enrichment analysis.
Definition of the presumed last mutation
The presumed AML transformation related mutations (last events) must meet the following conditions: 1. they must be involved in at least one of the three function pathways namely, active signaling, myeloid transcription or tumor suppression. 2. They emerged after AML transformation (better weight) or pre-existed at MDS diagnosis (poorer weight). Newly emerged mutations were preferentially considered as the presumed last events. 3. When ≥ two suspicious mutations co-existed to be defined as last events, the biologically more aggressive one (active signaling > myeloid transcription > tumor suppressor); or with lower VAF (Various allele frequency) among newly emerged mutations (meaning latest emergence); or special clone which transitioned from MDS to sAML point were defined as last events.
Statistics analysis
Statistical analyses were conducted using SPSS software version 18.0. Kaplan-Meier analysis was used to evaluate the time to survival and time to progression. All P-values were based on 2-sided tests and P-values less than 0.05 were considered statistically significant.
Results
Target sequencing
Paired samples were acquired from 64 patients and were analyzed using target sequencing. Detailed data are presented in Table 1. We concluded that:
1. Last mutations were verified for 55 of the 64 patients (85.9%), according to the steps described in METHODS.
Among the 55 patients with positive results, last mutations were detected for AML transformation for 35 cases (occupying 63.6% of cases). As for the remaining 20 cases, last mutations were detected at the point of diagnosis (Table 1). The former was termed the first transformation pathway; and the latter as the second transformation pathway. Further analysis showed that patients with MDS having first transformation pathway of last event showed longer transformation duration (mean 13.4 months vs 9.6 months for the first; median 11.0 months vs 8.5 months for the second; P=0.106) and more lower risk (<RAEB2/CMML2) cases at diagnosis (80.0% vs 75.0% for the second; P=0.431).
From Table 1, 2 and Figure 1, the following defined last mutations were present in descending order of occurrence frequency. First was active signaling, based on KRAS/NRAS in 11 patients (two had CBL or PTPN11 as last mutation events); FLT3 in six patients (one had CBL as last event); CBL in six patients (three had NRAS; FLT3 and CEBPA respectively); and PTPN11 in four (one had NRAS); the remaining two had PTPRD and GATA2, respectively. The next set of mutations comprised myeloid transcription: CEBPA in 10 patients (one had CBL); RUNX1 in five patients, and ETV6 in two patients. The next set of mutations comprised tumor suppressors: TP53 mutations in nine patients; WT1 mutations in three patients, and NPM1 in one patient. c-Kit mutations were not detected in this group of patients. Last mutation events seemed to be highly assembled in about 10 genes. Figure 1 shows the pattern of these last mutation events, demonstrating that the involvement of FLT3 was not the same as that of FLT3-ITD, which is often observed in de novo AML.
4. Of the 55 paired samples, we acquired one additional target sequencing results between the diagnosis point and AML transformation for six cases (Figure 2). Of these, three samples (UPN1243, 3288, and 4390) showed evidence of the last mutations emerging before phenotype change, i.e., when the last mutation event emerged, the disease was still in the non-AML stage, following which it transformed to sAML soon.
5. Analysis of the developed mutation patterns and VAF, as well as the logical relations between them, revealed differing patterns of evolution from MDS to sAML. Of the 55 patients with positive findings, most (25 cases) presented AML-transformation-related progression by linear evolution from the founding clone (Figure 3a). Five cases presented linear evolution from subclones (Figure 3b). Only four cases presented sweeping clone evolution (Figure 3c). The evolution pattern of the remaining 21 patients either could not be defined by mutation development/VAF (4 cases), or presented no obvious alteration before and after progression of AML.
6. Interesting rivalry patterns were observed. Some original gene mutations (possibly act as last events) were replaced or covered by some other mutations at the precise point of occurrence of sAML, such as in patients 2 (UPN3430), 5 (UPN809); 9 (UPN4603), 13 (UPN2570), and 54 (UPN3567) (Table 1)
7. Table 2 presents summarized data from Table 1. Although discrepancies existed among different last mutation events, TET2/RUNX1/ASXL1 /DNMT3A/EZH2/U2AF1 mutations were the most common initial/driver mutations for these AML-transformed patients. However, TET2/ASXL1 /BCOR/ROBO1 mutations were most commonly accompanied by those last events when transformation occurred.
Whole exome sequencing
Samples from five of the nine patients whose target sequencing showed negative results but for whom sufficient DNA extract was still available were further subjected to whole exome sequencing (WES) (patients labelled with star symbols in Table 1). Figure 4 presents the results of WES from the five cases. In addition to some MDS-related gene mutations, each of them gained at least one AML transformation-related gene mutation (involving activated signaling, transcription factor, or tumor suppressor) at the AML transformation point. Specifically, UPN1601 and UPN 3053 acquired NOTCH1 and PAK5 mutations, respectively (both involved in signaling transduction), during disease progression. NOTCH1 mutations are reported to occur frequently in acute leukemia, whereas PAK5 contributes to the proliferation of tumor cells as serine/threonine protein kinase.13,14 UPN1702 and UPN3831 acquired MLLT10 and NCOR2 mutations, respectively (both involved in transcription regulation), at the AML stage. MLLT10 is a histone lysine methyltransferase, which participates in AML pathogenesis via the formation of fusion gene MLLT10-MLL.15 NCOR2 regulates gene transcription as a part of a histone deacetylases complex, and its dysfunction leads to functional abnormality in hematopoietic stem cells.16 UPN3155 acquired a STK11 mutation (a tumor suppressor) after disease progression. It has been reported that STK11 regulates cell polarity and functions as a tumor suppressor.17
Single-cell RNA sequencing
As mentioned above, abnormal cell signaling induced by mutation during last events of genes such as RAS and PTPN11 may be critical for the transformation of MDS into AML. However, it is still unclear whether the RAS mutation is a definite precursor to the activation of RAS signaling. To elucidate this, we used single-cell RNA transcription sequencing to study the association between RAS mutation and RAS signaling in UPN4674 and UPN4763 (before and after disease progression). Two last events (NRAS and PTPN11 mutations) occurring during the AML stage are core genes in RAS signaling. As shown in Figure 5a and 5b, this patient presented several abnormal cell types (orange, UPN4674; blue, UPN4763 in Figure 5a). Gene classification analysis showed that special granular-mononuclear progenitors (GMP), common myeloid progenitor (CMP), megakaryocyte-erythroid progenitor (MEP) and monocytes (Mono) occurred during the AML stage (Figure 5b). According to the clinical information for this patient (from RCMD to AML), GMP should be considered the main target cell group. Integrated analysis based on single-cell sequencing indicated that several RAS signaling related-genes are expressed at high levels in GMP after disease progression (Figure 5c). These genes have been reported to participate in the activation of RAS signaling.18 Similarly, RAS signaling related-genes such as FLT3, INSR, and CDC42 are also expressed at high levels in MEP, CMP, and Mono groups after disease progression (Figure 5d). These genes are closely associated with cell proliferation. Interestingly, apoptosis-related gene BAD is down-regulated in GMP, MEP, CMP, and Mono groups after disease progression. These data suggest that past event gene mutations induce the redistribution of clonal cells (leading to an increased number of morphological blasts and monocytes) via the activation of cell signaling, and further leads to AML transformation.
Discussion
It should be emphasized that the pathogenesis of sAML differs from that of de novo AML in respects including clinical progression and prognosis [19]. Despite poor response to contemporary therapies, including transplant, MDS-derived AML exhibits a relatively long pre-transformation duration, and occurs only in about one third of all MDS.7 If this transformation (from MDS towards AML) requires triggering factors (specific gene mutations; could be named as last event in MDS-sAML), and these precursor mutations are reliable and manageable in types, it would be possible to develop counter mechanisms to block the transformation.
In this study, we sequenced 64 paired DNA extract samples, first at MDS diagnosis and then immediately after AML transformation. Fifty-five out of the 64 cases (85.9%) were positive for transformation-related mutations, i.e., events involving active signaling, transcription factors, and tumor suppressors. Sequencing based on 39 target genes could not detect any transformation-related last events in nine out of the 64 cases. Of these, paired whole exome sequencing was performed for five cases for which sufficient DNA could be extracted from BM samples. Each of the five patients were positive for at least one transformation-related mutation involving active signaling, myeloid transcription, or tumor suppressors (Figure 4). This assay did not reveal even one patient with MDS who developed sAML with only mutations involving epigenetic modulation, RNA slicing, cohesion etc. This finding suggests that the existence or emergence of some last event-like mutations should be considered a premise for MDS to transform into sAML, rather than as just occasional or accompanying events.9,10
Among the sixty patients whose last mutations were successfully detected, in 40 patients (66.7%; including 35 by targeted sequencing and 5 by WES), the last events appeared to emerge at the point of AML transformation, and the remaining 20 patients were positive for last mutations at the time of MDS diagnosis. Then it is necessary to tell which the original is? The existence/emerge of last event mutations? Or the AML transformations themselves? We considered that the pre-existence or emergence of last events should be the reason for AML transformation. Firstly, 20/60 (33.3%) cases harbored transformation-related mutations at disease diagnosis; the duration of transformation was obviously shorter in these cases than in those where the existence of last events was ruled out at the transformation point (mean 9.8 vs 13.4 months). Secondly, due to the relatively longer inversion duration, as well as longer corresponding intermittence between time points at diagnosis and AML transformation for paired sequencing, some important data may have been lost. When additional check points were added, supportive data were obtained from some patients (UPN1243, 3288, and 4390 in Figure 2). When last events emerged, disease was still at the low-risk or pre-AML stage, then transformed being AML soon. Finally, for one patient (UPN 4603), immediately after the occurrence of the AML phenotype, accompanied with the positive conversion of the NRAS and PTPN11 mutations, single-cell RNA sequencing results showed both active RNA transcription and activation of the RAS/PTPN11 signaling pathway.20,21 This is unusual because gene mutations occur prior to RNA/protein transcription/translation, pathway activation, and ultimately, phenotype alteration. Hence, it is evident that the related mutations were the reason for AML transformation.
Most of the last events clones showed linear evolution from the founding clones (25 cases, compared to 5 cases with linear evolution in subclone; and 4 cases by clone sweeping) (Figure 3), a little bit difference from published data.22,23 In addition, in this study, we detected some unique mutations involving TET2/RUNX1/ASXL1/DNMT3A (Table 2), indicating that these mutations could potentially be the precursor or driver mutations for these AML cases. As for TET2/ASXL1/ BCOR/ and ROBO1 mutations, they commonly emerged as partners of last events (Table 2), possibly playing some roles in the transformation process. Some mutations such as RUNX1 (highly prevalent among patients with chromosome 7 abnormalities) and BCOR (common in chromosome normal cases) could be responsible either for the MDS phenotype or for AML progression. Perhaps, all these dynamic genetic features contribute to the complexity and heterogeneity of this disease and may become a key to deciphering the rule for sAML occurrence.
Of note, the detected last mutations were from a cluster of 10 related genes. c-KIT mutation was never detected in this group of patients, and NPM1 was only detected in one case. The latter could be attributed to the favorable responses of the patients with MDS who harbor MPN1 mutations towards HMAs (decitabine), thus blocking sAML transformation.24 Together with TP53, mutations in NRAS/KRAS, CEBPA, and FLT3 were observed in 63.6% of the 55 cases for which last mutations were confirmed by using target sequencing. When RUNX1 /CBL/PTPN11/WT1 and ETV6 were joined in, only 10 mutations accounted for almost all the patients with MDS (94.5%) who were likely to undergo AML progression. In view of the rapid development of mutation-specific targeted therapy, we can attempt to block AML transformation in patients with MDS through continuous monitoring of last mutations and administering effective corresponding targeted therapy.
In summary, somatic mutations in activated signaling, transcription factors, or tumor suppressors appear to be a precondition for AML transformation in myelodysplastic syndromes. The tendency of the assembly of transformation-related gene mutations can be researched further and exploited in new targeted therapies.
Conflict of Interest
There are no financial and non-financial interests to declare.
Author contributions
X.L. was the principal investigators who conceived the study. C.-K.C., F.X., and L.-Y.W. carried out most of the experiments. L.-X.S. and Q.H. were responsible for bioinformatics investigation. J.G. and Y.T. participated in the preparation of biological samples. Z.Z., D.W., L.-Y.Z., C.X. and J.-Y.S. helped gather detailed clinical information for the study and helped to carry out clinical analysis. X.L., C.-K.C and F.X. wrote the manuscript.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (81770120 and 81770122). We thank Shanghai Tianhao Inc. for providing assistance in exome sequencing and data analysis.
Footnotes
This study was supported by the National Natural Science Foundation of China (grant nos. 81770120 and 81770122).