Multi-omic analysis of subtype evolution and heterogeneity in high-grade serous ovarian carcinoma

Multiple studies have identified transcriptome subtypes of high-grade serous ovarian carcinoma (HGSOC), but these have yet to impact clinical practice. Interpretation and translation of HGSOC subtypes are complicated by tumor evolution and polyclonality accompanied by accumulation of somatic aberrations, varying cell type admixtures, and different tissues of origin. The chronology of HGSOC subtype evolution was examined in the context of these factors by a novel integrative analysis of bulk absolute somatic copy number analysis and gene expression in The Cancer Genome Atlas, complemented by single-cell RNA-seq analysis of six independent tumors. The approach was validated by contrast to soft-tissue sarcoma. Genomic lesions associated with HGSOC subtypes tend to be subclonal, implying subtype divergence at later stages of tumor evolution. Subclonality of recurrent HGSOC alterations is particularly evident for proliferative tumors, characterized by extreme genomic instability, absence of immune infiltration, and greater patient age. In contrast, differentiated tumors are characterized by largely intact genome integrity, high immune infiltration, and younger patient age. We propose an alternative model to discrete subtypes of HGSOC, in which tumors develop from an early differentiated spectrum to a late proliferative spectrum, along a timeline characterized by increasing genomic instability and subclonal expansion. The proposed methods provide a new approach to investigating tumor evolution through multi-omic analysis. Statement of Significance This study proposes a method to infer whether transcriptome-based groupings of tumors differentiate early in carcinogenesis and are therefore potentially appropriate targets for therapy, and demonstrates that this is not the case for high-grade serous ovarian carcinoma (HGSOC). Significant findings for HGSOC include: Tumor purity, ploidy, and subclonality can be reliably inferred from different genomic platforms and show marked differences between subtypes Recurrent DNA alterations are associated with subtypes and tend to occur more frequently in subclones Single-cell sequencing of 42,000 tumor cells reveals widespread heterogeneity in tumor cell type composition that drives bulk subtype calls, but demonstrates a lack of intrinsic subtypes among tumor epithelial cells Findings prompt the dismissal of discrete transcriptome subtypes for HGSOC and replacement by a more realistic model of continuous tumor development that includes mixtures of subclones, accumulation of somatic aberrations, infiltration of immune and stromal cells in proportions correlated with tissue of origin and tumor stage, and evolution between properties previously associated with discrete subtypes


Introduction
High-grade serous ovarian cancer (HGSOC) is the most common histological subtype of ovarian cancer, accounts for 70-80% of ovarian cancer deaths, and is associated with poor prognosis and frequent relapse 1,2 . HGS ovarian cancer is a genomically complex disease that is characterized by ubiquitous TP53 mutations 3 , frequent loss of RB1, NF1 and PTEN by gene breakage events 4,5 , and recurrent high-level copy number amplifications 6 .
Molecular stratification of HGSOC is difficult due to the genomic complexity and extensive heterogeneity of the disease. Clinically relevant genomic stratification is currently restricted to the identification of homologous recombination deficiency (HRD), a condition that is present in roughly half of all HGS ovarian tumors, and that is attributable to germline or somatic BRCA mutations in approximately 20% of HGSOC cases 7,8 .
Several studies also reported molecularly distinct subtypes by clustering tumors together that have similar transcriptome profiles 4,[9][10][11][12] . The Cancer Genome Atlas (TCGA) project reported four subtypes 4 and named them based on marker gene expression: differentiated, immunoreactive, mesenchymal, and proliferative. These subtypes were found associated with several clinical and tumor pathology characteristics 4,11,13 , potentially reflect different tissues of origin 14 , and partly reflect differences in immune cell 15 and stromal 16 content. Robustness and clinical utility of the transcriptome subtypes remain controversial 17,18 . Based on a compendium of 15 microarray datasets consisting of ≈1,800 HGSOC tumors, subtype classifiers were not robust to re-fitting in independent datasets and grouped only one third of patients concordantly into four subtypes 13 .
Most HGSOC tumors are polyclonal, meaning that a single tumor is a heterogeneous assembly of distinct cancer genotypes arising from different subclones. Lohr et al. estimated that 95% of the TCGA HGSOC tumors are polyclonal, and ≈40% consist of ≥4 subclones 19 .
Recent single-cell studies further demonstrated extensive intra-tumoral heterogeneity of HGS ovarian tumors [20][21][22][23] , consistent with the notion of an individual tumor being an intermixture of different malignant cell populations. Transcriptome subtypes might provide a useful summary of bulk tumor properties if these subtypes are shared by different intratumoral subclones, representing an intrinsic difference between tumors. However, we hypothesized that subtype assignment via transcriptome clustering is driven by late events in tumor evolution, so that these subtype properties would be subclone-specific. We test this hypothesis first through a novel use of allele-specific copy number of genomic regions identified as subtype markers in bulk tumors. In contrast to phylogenetic analysis based on longitudinal whole-genome sequencing at multiple time points 24,25 , the proposed approach infers tumor evolution from copy number data obtained with genotyping arrays or exome sequencing at a single time point. We follow up with single-cell RNA-seq (scRNA-seq) analysis of six independent patients and investigate whether individual epithelial cancer cells have different well-defined subtypes and whether they correspond to the proposed subtypes of bulk tumors.

SCNA subtype association
SCNAs detected with GISTIC2 28 were obtained from the final run of the TCGA Firehose pipeline (2016-01-28). SCNAs were classified depending on their type (deletion / amplification) as either normal (0), loss / gain of a single copy (1), or loss / gain of two or more copies (2). Transcriptome clusters, assigning each tumor to a subtype 4 , were retrieved from the 2016-01-28 Firehose run. Subtype association was tested by $ test with = 6.
Multiple testing correction was carried out using an FDR 29 cutoff of 0.1.

Correlation of subtype association with subclonality
Using the $ test statistic as the subtype association score * of an alteration ( Figure 3A), and the fraction of tumors for which this alteration is subclonal as the subclonality score + ( Figure 3B), Spearman's rank correlation was computed to assess the relationship between * and + . Statistical significance of the correlation was assessed using Spearman's rank

Single-cell analysis
Five additional HGSOC specimens were subjected to 10x Genomics single-cell RNA sequencing, producing an average of 235 million reads per patient sample with an average of 11,502 cells/sample and 23,000 reads/cell. Statistical analysis was carried out using a collection of Bioconductor packages for single-cell analysis 32 . Sample collection, experimental procedures, and data processing steps are described in Supplementary Methods S1.

Results
We previously reported a systematic assessment of four reported HGSOC subtypes with respect to robustness and association to overall survival, and found that most tumors cannot be classified reliably, but that it is possible to predict how reliably each tumor can be classified and that some tumors can be classified with high confidence 13 . Here, we investigate an alternative model of HGSOC development in which ambiguity in tumor classification arises as a consequence of accumulated mutations and intra-tumor heterogeneity ( Figure 1).
To test this hypothesis, we focus on somatic copy number alterations (SCNAs) given their causal roles in oncogenesis and their potential to discriminate between cancer subtypes [33][34][35] . The GISTIC2 method detects SCNAs that are more recurrent than expected by chance, in order to distinguish cancer-driving events from random passenger alterations 28 . The method also separates broad arm-level events from narrow focal events, which often harbor oncogenes and tumor suppressors 36,37 . The ABSOLUTE algorithm 30 infers tumor purity and ploidy from the analysis of SCNAs, and incorporates this information for quantification of an alteration's absolute CN per cancer cell. The algorithm also identifies SCNAs not fitting a tumor's purity and ploidy relationship as a consequence of subclonal evolution.
Focusing on the subtypes proposed by TCGA 4 , we integrate information from 516 TCGA cases GISTIC2 and ABSOLUTE SCNA calls to analyze whether recurrent subtype-associated SCNAs display greater intra-tumor heterogeneity than other alterations. We assess the reliability of these calls by absolute CN analysis on whole-exome sequencing data, and complement results with single-cell subtype classification on approximately 42,000 cells from six independent tumors.

Tumor purity, ploidy, and subclonality can be reliably inferred from different genomic platforms and show marked differences between subtypes
Previous studies reported specific clinical and tumor pathology characteristics of the four subtypes 4,13 . Using previously published 31  To establish the reliability of inferring SCNA subclonality with ABSOLUTE from SNP-array data, we investigated whether results are consistent when using whole-exome sequencing data instead. In a separate benchmarking study 39 , we compared results from absolute copy number analysis of genotyping arrays with matched normal samples (the approach used here) to allele-specific copy number analysis of exome sequencing with and without matched normal samples, using the PureCN R/Bioconductor package 40 . There 39 we report that pertumor estimates of purity and ploidy were in good agreement between experimental platforms and computational methods (Pearson correlation of 0.77 for purity and 0.74 for ploidy). This also applied to individual CN calls in GISTIC2 regions, where we found a median concordance of 87.7% of tumors with identical CN state for one GISTIC2 region at a time.

Recurrent DNA alterations are associated with subtypes and tend to occur more frequently in subclones
We analyzed recurrent focal SCNAs as identified with GISTIC2 in TCGA HGSOC tumors for association with subtypes ( Figure 2). We tested all 70 SCNAs identified by GISTIC2, We test the hypothesis that reported HGSOC subtypes differentiate late in tumorigenesis by assessing the correlation between subtype association and subclonality of recurrent SCNAs ( Figure 1A-E). Subtype association of an alteration is calculated via a score * , corresponding to the $ test statistic ( Figure 3A). Subclonality prevalence of an alteration is calculated via a score + , defined as the fraction of samples for which this alteration is subclonal ( Figure 3B).
Under the null hypothesis that subtype-associated alterations occur no earlier or later than other alterations, Spearman correlation between * and + would be expected to be zero: Rejection of 5 has clear interpretation: if subtype-associated SCNAs tend to be subclonal, i.e.
( * , + ) > 0, this suggests that the subtypes are late events in tumor evolution. If subtype-associated alterations tend not to be subclonal, i.e.
We obtained a significant positive correlation between subtype association and subclonality prevalence of the 70 SCNAs ( Figure 3C) . To account for non-independence of the occurrence of different SCNAs, we carried out a permutation test ( = 0.006). When stratifying tumors by purity, since very high or low purity tumors present challenges for allele-specific copy number analysis, the correlation was positive in all strata and did not significantly differ for subtype association, and (ii) tested whether subtype-associated SCNAs tend to predominantly occur clonally (early) or subclonally (late). We first discuss genomic characteristics of the proposed subtypes based on SCNA subtype association and subclonality, and contrast these findings with the results from the single-cell analysis of six HGSOC tumors. We then propose a more realistic model of continuous tumor development that includes mixtures of subclones and evolution between properties previously associated with discrete subtypes and discuss possible implications for chemotherapy.

Inference of subtype evolution from somatic DNA alterations
Reliable inference of subtype evolution from bulk tumor genomics is possible because recurrent CN alterations in HGSOC are associated with the proposed transcriptome subtypes. Absolute CN analysis from SNP arrays and whole-exome sequencing allows detection of subclonal alterations 30,39 , and thus, inference of the subclonality of proposed subtypes. As reported elsewhere 39 , we also found individual CN calls and per-tumor estimates of purity and ploidy to be consistent across experimental platforms (SNP arrays and whole-exome sequencing) and computational methods (ABSOLUTE 30 and PureCN 40 ).
To demonstrate this novel approach, we showed that soft tissue sarcoma (STS) subtypes are driven by clonal SCNAs that occur early in tumor development. HGSOC and STS are both characterized by recurrent SCNAs and low levels of somatic mutations 41 ; comparable analysis of other cancer types that are mainly driven by somatic mutations would require incorporating allele-specific analysis of somatic mutations 30,40 . We note that this method could be extended to allele-specific analysis of point mutations for other cancer types, as long as those mutations are not too ubiquitous or too rare.
In HGSOC, proliferative tumors are associated with a large number of amplifications, higher ploidy and increased frequency of genome duplication. Amplifications associated with proliferative tumors drive a pattern that subtype-associated SCNAs tend to be more frequently subclonal than other alterations. By contrast, differentiated tumors display lower alteration frequency, close-to-normal ploidy, and smaller fractions of subclonal alterations.
Thus, differentiated and proliferative tumors appear to represent different ends of an evolutionary time scale. Since tumors generally evolve from a founder clone with fewer genomic lesions to multiple clones with accumulated lesions 57,58 , these observations indicate that the proposed subtypes are more likely stages along such a process of tumor evolution.

Single-cell sequencing reveals cell type-driven subtype assignments and presents new challenges for subtyping single cells
Single  Figures S4 and S6), which are, however, difficult therapeutic targets 63 .

Conclusions
In this study, we investigated whether ambiguity of gene expression-  Table S3). As a strong relationship exists between patient age and tumor stage of HGSOC 4,64 , this also implies that an earlier stage is likely at least partly due to earlier detection, as opposed to just a more indolent tumor, and that stages likely develop over decades. With the availability of more single-cell data for HGSOC in the near future, it will also be possible to more comprehensively study tumors at critical transition stages before, during, and after treatment. However, only a trend across many recurrent SCNAs is considered evidence for either hypothesis. Analysis of single cell gene expression patterns (F) should also distinguish between the two hypotheses.