Abstract
Early identification of the risk for triple-negative breast cancer (TNBC) at the asymptomatic phase could lead to better prognosis. Here we developed a machine learning method to quantify systematic impact of all rare germline mutations on each pathway. We collected 106 TNBC patients and 287 elder healthy women controls. The spectra of activity profiles in multiple pathways were mapped and most pathway activities exhibited globally suppressed by the portfolio of individual germline mutations in TNBC patients. Accordingly, all individuals were delineated into two types: A and B. Type A patients could be differentiated from controls (AUC = 0.89) and sensitive to BRCA1/2 damages; Type B patients can be also differentiated from controls (AUC = 0.69) but probably being protected from BRCA1/2 damages. Further we found that Individuals with the lowest activity of selected pathways had extreme high relative risk (up to 21.67 in type A) and increased lymph node metastasis in these patients. Our study showed that genomic DNA contains information of unimaginable pathogenic factors. And this information is in a distributed form that could be applied to risk assessment for more cancer types.
Significance We identified individuals who are more susceptible to triple negative breast cancer. Our method performs much better than previous assessments based on BRCA1/2 damages, even polygenic risk scores. We disclosed previously unimaginable pathogens in a distributed form on genome and extended risk prediction to scenarios for other cancers.
Introduction
The continuously increasing prevalence and mortality of breast cancers is a major health concern in women worldwide (1–3). Triple-negative breast cancer (TNBC) is a breast cancer subtype with absence of estrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor 2 (HER2), which makes up 15% to 20% of all breast cancer cases (4). TNBC is clinically aggressive because 1) the average age of initial diagnosis is 4 - 13 years earlier compared to other types of breast cancer (5–12); 2) The five-year survival rate is low, which reflects fast progression and poor prognosis (13–15); 3) lack of efficient therapy strategies due to the lack of either endocrine or anti-HER2 targets (12,16,17). Nonetheless, identification of the biomarkers for prognosis at earlier stages including the asymptomatic stage could facilitate timely treatment and expand the surgery window for breast cancer patients - TNBC the most, as in reducing the mortality (18,19).
Evidence that supports genetic hereditary features of breast cancer, especially TNBC, forms the foundation of our rationale. Inherited mutations have been observed in multiple epidemiological studies and clinical trials of TNBC with a positive percentage generally higher than 35% across different ethnicities (5,9,12,20,21), where the hereditary predisposition for sporadic cases might be masked by a limited family structures and driven in a non-Mendelian fashion (22). High penetrance predisposition genes, such as the notorious BRCA1/2 and other genes that function in homologous recombination (HR) repairing of double strand DNA breaks (23–28), provide more profound evidence for the germline heredity of TNBC. BRCA1/2 germline mutations are found in 5.3% of unselected breast cancers according to The Cancer Genome Atlas (TCGA) while the prevalence percentages rise up to 9.4 to 18.2% in TNBC from different study cohorts (5,7–10,29–32). In addition, data from investigations of genetic counselling showed that 40% of BRCA1/2 germline mutation carriers have been diagnosed with TNBC before the age of 40, twice as high as the proportion in all unselected breast cancer cases (12,33). These findings suggest that the germline mutations might play a more important role in TNBC compared to other types of breast cancer, which might contribute to its early onset, as well as other features.
However, germline determined risk is far from being explicitly evaluated in TNBC breast cancer. On the aspect of population coverage, the prevalence of disease-associated mutations for each individual gene is generally less than 5% except for BRCA1/2 (23,25,27,28), making it hard to extract the distribution of common features in heterogeneous disease such as TNBC with merely one or several genes. On the other hand, current predisposition genes are mostly limited in homologous recombination genes identified from studies based on different cohorts that report diverged or even controversial results especially when interventions, such as therapies, are introduced (5,34–39). These all suggest variable features of connections within the disease are not described by the known genes and thus call for more holistic evaluation approaches.
Quantified genome-wide polygenic approaches have been adopted in predicting risks of multiple genetic diseases, such as the coronary artery disease and Type 2 diabetes (40), and the deep learning algorithms are adopted to empower robust data-driven feature recognition in the application of identifying human pathogenic germline mutations (41) as well as gene expression projection from the genome mutations (42). Nonetheless, these models’ performance egregiously depends on the degree of inherent heterogeneity of diseases, meaning the fewer numbers of biological functions are affected by the disease, the better their representations would be (40–42). Since cancers are significantly more heterogeneous than other complex genetic diseases, the applications of previous models have been constrained (14). Our study aims to tackle a question that necessitates an answer yet not solved - who are more likely to suffer from the triple-negative breast cancer? Our goal is to find a quantitative standard using the germline mutation information - the earliest predictive features in any given individual, so that susceptible individual could be accurately identified from the female population with false negatives and false positives reduced at the same time. The specific approach we take is primarily to build a machine learning framework based on genetic data and prior information about biological functions. The phylogenetic coding variation information is projected step by step in a new spatial space that reflects the activity of pathways. Based on this, we find that healthy risk-free individuals are significantly spatially separated from patients and all the subjects can be clustered according to their distributions. Intriguingly, we find the identified classifications could predict relative risk of developing TNBC, clinical staging characteristics; and the pathogenicity of BRCA1/2 is associated with the identified classifications, supporting the validity of the methodology and its future application to a wide range of disease risk assessment scenarios.
Materials and Methods
Study participants
The study cohorts included 106 TNBC patients recruited from this study and 287 cancer-free elderly women (CTRL) over 80 years old from an open dataset (43), which served as the control. The statistics of clinical and general information were shown in supplementary Table 1-3 for TNBC patients and supplementary Table 4 for the Control. The study protocol was approved by the Research Ethics Committee at Guangdong General Hospital, Guangdong Academy of Medicals Sciences. Written informed consent was obtained from all participants for the use of banked tissues (including white blood cells and buccal cells), collection of pathological data and clinical follow-up data. The SNP data on CTRL was gathered from a previous report (43).
Whole-exome sequencing and mutation calling
Peripheral blood mononuclear cells (PBMC) were collected from TNBC patients, which was defined by ≤1% estrogen receptor (ER) positive tumor cells, progesterone receptor (PR) negativity, and normal HER2-receptor expression (44). Each sample was prepared according to the Illumina protocols. Paired-end multiplex sequencing of samples was performed on an Illumina HiSeq X Ten sequencing platform. Samples from the CTRL cohort were sequenced on an Illumina HiSeq 2000 system. TNBC cases were sequenced to a median depth > 120X and controls were sequenced to a median depth > 60X. Paired-end raw sequence reads were mapped to the human reference genome (UCSC hg19) using the Burrows-Wheeler Aligner (45) with default settings. Mutation calling was carried out using the HaplotypeCaller module in the Genome Analysis Toolkit (GATK) (46) according to GATK Best Practices. Briefly, the aligned BAM files were first marked for duplicated reads by Picard. Local realignment around indels and base quality score recalibration were performed by GATK. The processed BAM files were then used to call SNPs and indels. Mutation filtering of SNPs and indels were performed separately by mutation quality score recalibration using GATK. The filtered mutations were then annotated by ANNOVAR (47).
Statistical analysis
The R package pheatmap was used to plot the clustered heatmaps using Ward’s criteria (48). Naïve Bayes implemented in R package caret was performed to build the binary classifiers to separate the TNBC patients from the CTRL. All data were randomly split into the training group (60%) and the testing group (40%) preserving the overall class distribution of the data. The optimal parameters were selected by 5-fold cross validation. Several criteria including area under the receiver operating characteristics curve (AUC), accuracy, sensitivity, and specificity were used to measure the performance of the classifier. Accuracy was calculated as the proportion of true positives and true negatives in all evaluated cases, sensitivity was the probability of testing positive with the disease present, and specificity was the probability of testing negative with no disease present. The R package ggscatter was used to draw the scatter plots and linear regression lines with 95% confidence interval. Pearson correlation coefficients and p-values were also labeled on the plot. The NMRCLUST (49) method was used to cluster the pooled samples into an optimal number of classes. This method uses average linkage clustering followed by penalty functions to simultaneously optimize the number of clusters and the average spread of clusters. All statistical analyses and plots were conducted using in-house scripts developed in Perl, Python, R, and MATLAB.
Data availability
The data that support this study are provided in supplementary tables. Whole exome sequencing data have been deposited in the Short Read Archive (SRA) under the BioProject ID PRJNA497746 (https://www.ncbi.nlm.nih.gov/sra/PRJNA497746). We have implemented our methods as a web server, which is freely available at http://philrivers.vicp.io:9900/.
Results
A machine learning model to delineate genome wide TNBC susceptibility
As populations tend to go through purifying selection for complex disease susceptibility mutations, we hypothesize that rare non-synonymous germline mutations are functional in TNBC susceptibility. The functional importance of these rare mutations, on the other hand, is hard to be delineated due to allelic heterogeneity and the possibility that they might work as a network collectively. We aspired to build a machine learning framework (Damage Assessment of Germline shotGun, DAGG) that projects the rare non-synonymous germline mutations of an individual in high-dimensional but sparse onto an informative low-dimensional spectrum of sixty signaling pathways, which have been concurrently reported in association with cancer, drug target or immune related terms in public knowledge databases.
DAGG included three components that work sequentially as illustrated in Fig. 1. The first component took the non-synonymous mutations and expression as input and the effect of each mutated gene on the expression of all genes as output. Genes were coded as 1 or 0 represent any mutations presented or not, respectively. Then a Z-score, defined as the driving force, to measure the statistical significance of causal connection from the mutations on DNA to RNA expression profiles was calculated for each gene using both the mutation and expression data from the COSMIC Cell Line Project (50). Briefly, for any given mutated gene vi, all cell lines were separated into two groups: MT group and WT group based on whether they carried the non-synonymous mutations in this gene vi. To investigate how the expression of any gene gjcould be impacted under the given mutation profile of gene vi, the difference of the mean expression of gene gj between the MT and the WT group was quantified and defined as the differential expression score Δij. To account for the background effect, we randomly group cell line while maintaining groups size for 100,000 times, the Z-score (as the driving force Fij) could be calculated with the following equation, where Δnull stands for the differential expression score for each random grouping while mean and std calculate their mean and standard deviation respectively. A bigger absolute value of Fij represents stronger effect (called deterministic expression when |Fij| ≥ 3) of the mutated gene vi on the expression of gene gj, and we took values either no smaller than 3 or no larger than −3 as the cutoff for significance. Traversing all the genes (assuming the total number of mutated genes is m and the total number of genes with expression data is n) through these steps would generate an m x n matrix of driving force (Fig.1B) and project the sparse mutation profile in a discrete space into a continuous space related to the gene expression profile, thus introducing the global gene regulation functions of non-synonymous mutations based on COSMIC cell line dataset in the first component.
The second component translated the global changes in gene expression into activation or repression of signaling pathways. First, the effect of one mutated gene vi on one pathway Pj activity was defined as the following equation:
Where F is the driving force from the first component, jk is the column position in the m x n matrix of the kth gene in the pathway, a is the total number of genes involved in the jth pathway, wk is the weight of the kth gene in the pathway. The weights for up- and down-regulated pathway signature were 1 and −1, respectively. The Z-score of FPij was calculated based on 100,000 times simulation in which a random gene set having the same number of genes as the specified pathway does was randomly selected for the calculation of the simulated FP. For a specific individual carried b mutated genes, the collective effect of these mutated genes on the pathway activity was defined as the summation of Z-score of FP of each mutated gene. Then another Z-score of the sum was calculated based on 100,000 times simulation in which b genes were randomly selected for the calculation of the simulated summation. The second component further reduces the high-dimensional sparse mutation data to features of sixty pathways (1 x 60 matrixes compared to m x n in the first component) and made it easier to make comparisons among different individuals.
We hypothesized that the phenotypically similar individuals should have common pathway activity spectrum features which were adequately distinguishing between phenotypically differed cohorts, i.e. the TNBC and the healthy control. Therefore, we pre-selected pathways that were the most significantly impacted by rare non-synonymous germline mutations with Z-scores no larger than −4, and constructed the third component to assign each individual in pooled cohorts to specific classes and groups based on activity spectra of these pathways. Note the number of clusters determined by minimizing the spread across each cluster. The relative risk in each class was then calculated, followed by stratification of these classes into three groups by the range of relative risks and z-scores, where Group 1 was with the lowest combined relative risk and Group 3 the highest.
TNBC patients significantly differentiated from healthy subjects by pathway activity spectrum
To lay the foundation, we first analyzed whole exome sequencing (WES) data of rare non-synonymous germline mutations from a cohort of 287 cancer free elderly women over 80 years old (287 subjects from LOAD cohort (43)) that used as the CTRL cohort, whom we postulate the germline genetics begets the natural resistant against TNBC to throughout evolution. After obtaining the features of their pathway spectra, we observed an intriguing phenomenon that all individuals exhibit either one of two typical differentiated pathway spectrum patterns that we identified as Type A and Type B (Fig. 2A). Among Type A subjects, the functional effect of the rare non-synonymous mutations could be clearly defined as up-regulation or down-regulation of the pathways. This means the functional effect of the mutations on the pathway activity was sufficiently significant to yield z-scores of large absolute values, so that clear cut-off could be set to define up- and down-regulation of pathways and generate high-contrast pattern on the heatmap. On the contrary, the functional effect of the rare mutations in Type B was ambiguous, demonstrating an overall reluctance against pathway activity alterations marked with low contrast patterns on the heatmap. These findings suggested an inherent difference in the “tolerance” of rare non-synonymous germline mutations among cancer free populations. Next for the TNBC cohort, we carried out WES of peripheral blood cells collected from 106 TNBC patients with a median depth > 120X in contrast to > 60X in the CTRL. Similarly, the TNBC patients also exhibited either Type A or Type B of overall pathway spectrum patterns (Fig. 2B). This finding indicates that the pathway spectrum typing is ubiquitous across populations as an inherent tolerance against rare non-synonymous germline mutations, potentially independent of the presence of TNBC. This also prompts that comparison of patients and healthy individuals should only be made after typing in order to rule out restrictions invited by the difference of tolerance against rare non-synonymous germline mutations. When we analyzed the pathway functional features within the Type A category, significant differences between CTRL and TNBC patients were identified from the heatmap in Fig. 2C, with TNBC patients well separated from the CTRL (AUC = 0.89) (Fig. 2D). Meanwhile, TNBC patients could also be separated from the CTRL in the Type B category (AUC = 0.69, Fig. 2E-F). These results suggest that global germline genetics plays a role in the susceptibility of TNBC, even though it is traditionally believed that the hereditary risk is relatively low for this type of cancer when germline mutations are reviewed individually.
Functional pathway features coded by TNBC germline rare non-synonymous mutations
To mine the functional features specific to TNBC, we used the 52 TNBC patients and 102 CTRL in type A to compare the sixty pathways activities between the two cohorts. We calculated the differences in each pathway by subtracting the averaged values of a pathway in CTRL from that in TNBC. Next, we conducted 100 million times of permutation. Each permutation randomly shuffles the label of CTRL and TNBC for each subject so that the sample size of each group remains unchanged. Then the significance was calculated as the standard score that measures how many standard deviations deviated from the population mean (defined by Z-score). The results revealed that only 11 out of the 60 pathways had been positively regulated (3 of them are significantly regulated, Z-score ≥ 3), whereas the other 49 had been negatively regulated (25 pathways are significantly regulated, Z-score ≤ −3) (Fig. 3A-B). With approximately 87% signaling pathways suppressed in the activity, these results provide evidence that the germline genetics in TNBC patients tends to increase the vulnerability via suppression over the global pathway activity instead of over-activation on selected pathways.
Similar analyses were carried out for Type B CRTL and TNBC (Fig. 3C-D), and the global suppression on pathways (43 out of 60) was still a major event in terms of the down-regulation, despite only 12 pathways had significance with z-score ≤ −3. Although 17 pathways were defined as up-regulated in Type B, as opposed to 11 pathways in Type A, none of these pathways had sufficient significance with Z-score ≥ 3. Nonetheless, the pathway activity difference between TNBC patients and correspondent CTRL demonstrated similarities between Type A and Type B, giving a fitness of linear regression with r = 0.7 and p = 4.5e-10 for their functional alterations (Fig. 3E). This suggests that the functional pathway deviation along with TNBC development was conserved no matter what the starting pattern typing is, i.e. Type A or Type B. In addition, immune-related pathways were commonly down-regulated in both Type A and Type B (Z scores are all smaller than 0, Red dot in Fig. 3E).
Activities of selected pathways define relative risk of TNBC
We hypothesize that a more detailed classification will be helpful to better determine the TNBC susceptibility from general population. Therefore, we applied a penalty model (51), which provided us an optimal total number of classes in each type, to the combined population of TNBC patients and CTRL subjects. The optimal number of classes was fifteen for Type A and thirteen for Type B (Fig. 4A-B). The classes of Type A and those of Type B were presented by eight and six selected pathways (Z-score <= −4), respectively. TNBC patients and CTRL subjects were not uniformly distributed among different classes (Table 1 and Table 2). The odds of TNBC patients to CTRL individuals was calculated to evaluate the relative risks of each class, where classes with more TNBC patients were esteemed as of high relative risk and classes with more CTRL subjects as of low relative risk plus classes with approximately even proportions as of moderate relative risk.
Fig. 4C and Fig. 4D presented the pathway patterns in different classes that were in descending order of relative risk in Type A and Type B, respectively. The average Z-score of the pathways in each class and the relative risk followed a power law distribution (p = 1.8e-6, Fig. 4E) in Type A, while a linear correlation was found in Type B (p=2.1e-4, Fig. 4F). These finding reinforce that suppression of pathway activity is highly correlated with the germline genetics begetting TNBC vulnerability. After further analyzing the distribution statistics of each class for better interpretation, classes of Type A and those of Type B were separately merged into three groups (Table 3 for Type A and Table 4 for Type B). Compared to the classes, the groups had better-defined gradient descending patterns of the average pathway activity along with the increasing of relative risk (Fig. 4G-H), and thus provided more feasible stratifications for phenotype associating analyses.
Clinical outcomes associated with relative risk defined in groups
After further classification and annotation, Type A and Type B individuals with pathogenic or rare non-synonymous mutations in BRCA1/2 from the TNBC cohort were remarkably enriched in specific groups of distinctive relative risks. BRCA1/2 rare non-synonymous mutation carriers were enriched in Group 3 in Type A (Fig. 5A) whereas Group 2 in Type B (Fig. 5B). Based on 100,000 times of permutation of the class labels, we found the patients in class 3 (part of Group 3) of Type A were more likely to carry BRCA1/2 non-synonymous mutations with statistical significance (p = 0.0078), whereas class 1 and class 5 in Type B (both were parts of Group 2) enriched more patients carrying BRCA1/2 (p = 0.0227 and p = 0.0396, respectively) non-synonymous mutations. This divergent distribution might introduce a more accurate prediction model of the BRCA1/2 non-synonymous mutation pathogenicity when combined with population typing, specifically that BRCA1/2 non-synonymous mutation carriers in Type A have a higher risk than in Type B, which could help better assess the general disease vulnerability in individuals bearing relevant functional mutations.
Lymph node metastasis and age of onset could also be associated with the relative risk calculated by this model. Statistical annotation showed that lymph node metastasis was associated with class 3 (p = 0.0014, relative risk = 11.71), class 5 (p < 0.0001, relative risk = 0.44), class 8 (p < 0.0001, relative risk = 0.39) in Type A, as well as class 4 (p = 0.0038, relative risk = 2.0) and 12 (p = 0.0428, relative risk = 4.26) in Type B (Fig. 5A-B). With respect to the age of onset of TNBC, a gradual decrease was observed along with the increasing relative risk from Group 1 to Group 3 in both Type A and Type B (Fig. 5C-D), where patients in Group 1 of Type B showed a significantly higher onset age (p = 0.029, relative risk = 0.27).
Discussion
Previous screening approaches for genetic predisposition were generally based on partial high-penetrance gene mutations. These approaches found only a limited number of breast cancer cases were attributable to heredofamilial disorders (52), leaving the vast majority of cases as sporadic-like, including TNBC (18). We hypothesized that these sporadic-like cases involved germline mutations. Such sporadic-like cases often display remarkable non-Mendelian genetic features, such as polygenic traits, incomplete- and co-dominance (53,54). On the one hand, these features make it difficult to rule out familial inheritance, but on the other hand favor our hypothesis of using holistic germline information to characterize sporadic-like TNBC. Our approach assumes that rare non-synonymous germline mutations predispose individuals in a population to cancer, whereas somatic mutations initiated and triggered the progression of the cancer (55). Furthermore, we postulated that breast cancer is a ‘bubbling’ and persistent pathogenic stress on the female population, and that resistance against this stress is evolutionary selected. As an evolutionary population trait, the resistance must be encoded in a complex multi-factorial genetic network, which can only be disrupted by the collective effects of multiple mutational events. These assumptions underlie our rationale to integrate the holistic information of all rare mutations in the entire coding regions of germline DNA to evaluate the quantitative status of this resistance.
Our knowledge-data-driven hybrid machine learning framework can project layer by layer high-dimensional (more than ten thousand), inscrutable, binary, and sparse mutation data onto lower-dimensional (dozens) but information-rich data with quantitative features. The data is integrated and transformed by the built-in machine learning algorithms, and each layer can take the output of the previous layer as the input to generate the next output. We use this approach to evaluate the risk of developing TNBC in an individual. The model “comprehends” the rare genetic mutations in each female as a whole, assesses the level of risk and classifies them as having either high or low risk accordingly. Using our machine learning approach to understand the overall genetic information, we found that in the Type A population, TNBC patients could be separated from the CTRL individuals (AUC = 0.89, Fig. 2D). This finding led to the first key conclusion that in patients with sporadic TNBC, the cause of disease might be systematically rooted in the germline mutations. In addition, the driving force of the etiology in Type A and Type B cohorts followed in an analogous manner, as seen with an r of 0.70 (p = 4.5e-10, Fig. 3E) in the significance correlation analysis.
More specifically, immune-related pathways, such as B-cell receptor, NF-κB, CD40, and lymphotoxin-B receptor signaling, were found to be involved in the relationship between pathways activities and TNBC risk. The degree of suppression in these pathways was rigorously correlated with the relative risk for TNBC when compared with CTRL. The highest levels of suppression were significantly associated with increased risk in both Type A and Type B. Furthermore, pooled Type A and Type B cohorts were clustered into three distinct groups in accordance to their pathway activity spectrum. The relative risk in the TNBC cases was calculated compared with the CTRL within each of the three groups. The relative risk intrinsically indicates the possible risk in any randomly selected sample of CTRL or TNBC in any given group with similar pathway activity spectra, and thus quantitatively describes the TNBC risk in the given activity group. We observed the relative risk increased from group 1 to group 3 in both Type A and Type B, whereas the onset age gradually decreased. The lymph node metastasis was mainly found in group 2 and group 3 in both Type A and Type B. Patients with BRCA1/2 mutations were more likely to be found in group 3 of Type A and in group 2 of Type B.
We used a cohort of cancer-free elderly women as the control. This group had an average age above 80, which we believe represents the minimum risk under specific ethnic and geographical conditions, as they were free of cancer for these many years. This enabled us to map the TNBC risk according to the relative risk of each pathway activity group. However, this sample of healthy senior females might not fully represent general ethnic and geographical populations, as well as age ranges. More representative samples will be needed in future studies to link other phenotypical traits with the pathway activity spectra through our model.
Conclusions
Our machine learning framework (DAGG) views the germline information buried in individual genomes differently. This model does not consider for individual mutation separately in the gene level but projects all rare germline mutations into each of the signaling pathways in the functional level (signaling pathway activities) and then uses the spectra that consist of multiple pathways to delineate pathological characters of each individual. The results have validated the hypothesis that relative risks indicated by the germline-decided pathway activity in DAGG could serve as an early index for predicting TNBC related pathology. The TNBC related pathology could include but not limited to onset, metastasis relevant prognosis and resistance against pathogenic mutations. These provides us two advantages: the first is the possibility of early identification of TNBC high-risk. This allows better treatment and improves prognosis. The second is the methodological generality of DAGG that enables applications in other type of genetic-related cancer types once training datasets are properly selected.
Disclosure of Potential Conflicts of Interest
The authors declare no potential conflicts of interest.
Author contributions
GN and YF conceived the experiments. GN, KW, MY, and ZW, designed the experiments. MY, QZ, XL, TZ, MC, JX, CY, and HG performed the experiments. YF, GN, QZ, SH, CZ, GT, YQS, and MQZ analyzed the data. YF, MY, ZF, and GN wrote the paper. All authors discussed the results and contributed to the final manuscript.
Grant Support
This work was supported by the National Natural Science Foundation of China [81202076 to MY, and 81871513 to KW]; the Guangzhou Science and Technology Program [2014J2200007 to MY]; and the Natural Science Foundation of Guangdong [2017A030313882 to KW].
Acknowledgement
The authors would like to thank Yanxiang Ni, Jin Gu, Xu Li, and Zhonghai Zhang for feedback on the manuscript and helpful discussions.