Cis-Compound Mutations are Prevalent in Triple Negative Breast Cancer and Can Drive Tumor Progression

About 16% of breast cancers fall into a clinically aggressive category designated triple negative (TNBC) due to a lack of ERBB2, estrogen receptor and progesterone receptor expression1-3. The mutational spectrum of TNBC has been characterized as part of The Cancer Genome Atlas (TCGA)4; however, snapshots of primary tumors cannot reveal the mechanisms by which TNBCs progress and spread. To address this limitation we initiated the Intensive Trial of OMics in Cancer (ITOMIC)-001, in which patients with metastatic TNBC undergo multiple biopsies over space and time5. Whole exome sequencing (WES) of 67 samples from 11 patients identified 426 genes containing multiple distinct single nucleotide variants (SNVs) within the same sample, instances we term Multiple SNVs affecting the Same Gene and Sample (MSSGS). We find that >90% of MSSGS result from cis-compound mutations (in which both SNVs affect the same allele), that MSSGS comprised of SNVs affecting adjacent nucleotides arise from single mutational events, and that most other MSSGS result from the sequential acquisition of SNVs. Some MSSGS drive cancer progression, as exemplified by a TNBC driven by FGFR2(S252W;Y375C). MSSGS are more prevalent in TNBC than other breast cancer subtypes and occur at higher-than-expected frequencies across TNBC samples within TCGA. MSSGS may denote genes that play as yet unrecognized roles in cancer progression.

3 subsequent therapies. Additional samples are accessed as archival tissues, as leftovers following clinically indicated procedures, and from tissues taken at autopsy. Samples are selected for sequencing based on specimen size and tumor content (Fig. 1a). Results from 11 of the first 12 subjects are presented here because low tumor purities in Subject 08 precluded analysis. We performed WES of germline DNA and 67 tumor samples (Extended Data Table 1 Table 1).
Our analysis focused on somatic single nucleotide variants (SNVs), which comprise the majority of mutations in breast cancer 4,6,7 . WES identified a total of 8449 SNVs, of which 7067 occurred within 5136 protein coding genes, including 43 breast cancer driver genes (0.84%) 7 (data not shown). Across all 67 samples, 1403 genes were affected by >1 SNV. Remarkably, 426 genes were found to contain >1 SNV within the same tumor sample (Extended Data Table 1), and we designated these instances Multiple SNVs affecting the Same Gene and Sample (MSSGS) (Fig. 1b).
MSSGS were observed in 65 of 67 samples, with a median of 8 and a range of 0 to 133 MSSGS per sample (Extended Data Table 1). The distribution of median transcript sizes was no larger for MSSGS than for genes affected by single SNVs 8 (Extended Data Fig. 1). Concomitant . 2a) with the exception of an MSSGS involving MUC4, which fell within a region of kataegis 9 (Fig. 2b).
MSSGS can arise when two or more SNVs affect: i) the same gene in different tumor cells, ii) different alleles in the same cell, or iii) the same allele in the same cell (ciscompound mutation) (Fig. 1c). To assess the frequency with which SNVs contributing to MSSGS co-localize to form a cis-compound mutation we cloned and sequenced tumor DNA from eight patients, revealing a very high frequency of cis-compound mutations (12 of 13 evaluable MSSGS -92%) ( Table 2). Haplotype phasing 10 of 407 MSSGS similarly indicated that >90% were attributable to cis-compound mutations (Extended Data  Fig. 4c), and ponatinib treatment in the patient produced a regression of breast cancer infiltrates in the skin (Fig. 4d). Possibly indicative of a growth advantage conferred by other MSSGS are results from We reasoned that if cis-compound mutations can confer a selective advantage in TNBC, MSSGS might be more prevalent than would be predicted from a random co-localization of SNVs. To address this question we evaluated 106 TNBC samples from the TCGA. These results indicate that for many samples, the number of SNVs contributing to MSSGS is higher than predicted had they arisen independently. To further examine this phenomenon, we performed a permutation test for each of the 106 patients to test the null hypothesis that SNVs co-localize to form MSSGS randomly (details are provided in Methods). With the 106 resulting p-values, we accept six patients when we use the Benjamini-Hochberg (BH) procedure 16 to control the false discovery rate at 0.05 (Fig. 5b).

Biopsies
Biopsies of metastases involving lymph nodes, subcutaneous tissues, and liver were performed under ultrasound guidance using an 18 gauge BioPince Full-Core Biopsy instrument. Up to 5 disease sites were biopsied in a single setting, and multiple biopsies were performed per disease site. Bone marrow biopsies were taken from the iliac crest using a Jamshidi T-Handle needle. Skin biopsies were performed using 3 -4 mm punch biopsy instrument. In Subject 2 circulating tumor cells were collected by leukapheresis. All biopsies were performed using local anesthesia and most were performed using conscious sedation. Biopsy specimens were processed in accordance with a standardized set of operating procedures and subjects were contacted one day and one week following the procedure to assess for complications.

Rapid On-Site Evaluation
Core biopsy samples of metastases involving lymph nodes, soft tissues, or liver were immediately photographed and divided orthogonally. One half was formalin fixed (for formalin fixation and paraffin embedding [FFPE]) and the other half was gently pressed against an RNAse treated slide to generate a touch prep, then immediately snap frozen. Cytological evaluation of the touch prep was performed by a pathologist in real time and, if necessary, additional biopsies were procured to optimize sample size and tumor content. The time interval between sample acquisition and the completion of processing (known as the cold ischemia time) was recorded and was, except for the bone marrow biopsies and leukaphereses, less than 5 minutes in all cases ( Table 2).

Residual Clinical Materials
When feasible, leftover blood samples and pleural fluid specimens were processed and stored for analysis. The estimated time between sample collection and processing was recorded.

Autopsy Samples
At

Whole Genome Sequencing
Confirmation of a subset of MSSGs was done by NantOmics (Culver City, CA), which performed whole genome sequencing (WGS) in 39 of the 67 samples described here, using the same DNA that had been tested by WES. WGS Sequencing was performed on the Illumina HiSeq X sequencing platform using libraries prepared via the KAPA Hyper prep kit. Tumor genomes were sequenced to an average depth of 60x, and Normal genomes were sequenced to an average depth of 30x. Mutations were identified using the CLIA-validated NantOmics Contraster pipeline as previously described 35 .

Germline Genome
Germline variant detection is performed to identify genes associated with inherited cancer syndromes, such as mutations involving BRCA1 or BRCA2.

14
Clinical data are entered into REDCap, a web--based application for Electronic Data

Comparing sizes of genes affected by MSSGS versus isolated SNVs
To assess whether the formation of MSSGS is related to gene size, we compared the distributions of the median transcript sizes between genes affected by MSSGs versus genes affected by isolated SNVs. We performed the Wilcoxon rank-sum test to compare the means of the two distributions and found no significant difference (Extended Data Fig. 1). Gene transcript sizes were obtained from Biomart (Ensembl).

Haplotype phasing of SNVs associated with MSSGS
To assess the orientation of neighbouring SNVs within MSSGs, we used ReadBackedPhasing from Genome Analysis Toolkit version 3.3.0. Phasing information with at least 20.0 quality score was used to assess whether a pair of SNVs are cis or trans.

SNVs.
To assess whether the formation of MSSGS is driven by elevated local mutation rates, we compared local mutation rates surrounding MSSGS with mutation rates genome wide. In order to calculate the local mutation rate for each MSSGS, we counted the To assign a patient-specific P-value, we repeat the above simulation procedure using the observed value of m. We then compare the observed count of SNVs contributing to MSSGS with the 10,000 count values from the simulation. We define the P-value as (t + 1)/10001, where t is the number of times that the observed count of SNVs contributing to MSSGS is less than or equal to the count from simulations.

High Throughput Drug Screen
The drug sensitivity profiles of transiently cultured cells from a pleural effusion from