FrenchFISH: Poisson models for quantifying DNA copy-number from fluorescence in situ hybridisation of tissue sections

Chromosomal aberration and DNA copy number change are robust hallmarks of cancer. Imaging of spots generated using fluorescence in situ hybridisation (FISH) of locus specific probes is routinely used to detect copy number changes in tumour nuclei. However, it often does not perform well on solid tumour tissue sections, where partially represented or overlapping nuclei are common. To overcome these challenges, we have developed a computational approach called FrenchFISH, which comprises a nuclear volume correction method coupled with two types of Poisson models: either a Poisson model for improved manual spot counting without the need for control probes; or a homogenous Poisson Point Process model for automated spot counting. We benchmarked the performance of FrenchFISH against previous approaches in a controlled simulation scenario and exemplify its use in 12 ovarian cancer FFPE-tissue sections, for which we assess copy number alterations in three loci (c-Myc, hTERC and SE7). We show that FrenchFISH outperforms standard spot counting approaches and that the automated spot counting is significantly faster than manual without loss of performance. FrenchFISH is a general approach that can be used to enhance clinical diagnosis on sections of any tissue. Author summary Cancer genomes can look very chaotic, because cancer cells are unable to fully repair errors in DNA replication during cell division. While a healthy genome has two copies of every chromosome, in a cancer genome some pieces can be lost completely and others can appear in 50 copies. To diagnose cancers and to decide on the right therapeutic strategy for a patient, it can be very important to know how many copies of a particular piece of DNA exist in a cell. The standard technique used in the clinic to assess DNA copy number is called FISH, short for fluorescence in situ hybridisation. This technique uses fluorescent probes that bind to a DNA piece of interest and show up as glowing spots in a microscopic image. Counting the spots in an image is a labour-and time-intensive process that is generally done by well-trained experts. Here we present a statistical approach to automatically count FISH spots, which outperforms previously proposed methods, and has the potential to substantially speed up clinical diagnostics.


Introduction
1 Chromosomal instability coupled with defective DNA repair can cause loss or duplication of 2 DNA, a characteristic attribute of cancer cells [1]. Interrogation of DNA copy number 3 aberrations is critical for diagnosis [2] and understanding tumour etiology [1]. Technologies 4 for measuring DNA copy-number have evolved from optical profiling of single loci [3] 5 through to sequencing of the entire tumour genome [4]. However, determining the absolute 6 number of copies from bulk sequencing data remains difficult because of normal cell 7 contamination and intra-tumour heterogeneity [5], and results are generally reported in 8 terms of loss or gain of DNA relative to an assumed diploid or median background. 9 Information from single locus methods is therefore often required to validate estimates of 10 absolute copy-number [6,7]. 11 Fluorescence in situ hybridisation (FISH) of interphase nuclei is the most widely 12 established technique for interrogating single locus copy number. Fluorescent probes are 13 hybridised to a specific genomic region of interest and appear as discrete foci when 14 visualised with fluorescent microscopy [8]. Standard analysis of FISH data relies on 15 time-consuming manual counting of spots in these images [9]. Automated systems to 16 quantify foci using nuclei recognition and spot counting algorithms (reviewed in [10]) aim 17 to make the analysis of FISH data less labour-intensive, faster, and more objective. However, 18 the accuracy of most systems is limited to identification of spots in intact and separated 19 nuclei [11]. Thus, these systems can be very successful in haematological malignancies; 20 however, diagnostic sections of solid tumour tissue pose a significant challenge for both 21 automated and manual analysis. Accurate identification of single nuclei either by eye or by 22 automatic image segmentation can be hard if nuclei cluster closely and overlap (see Fig 1). 23 Arbitrary cut points between grouped nuclei are typically used to separate these clusters, 24 which can lead to noisy spot count estimates. Additionally, tissue sections are typically 3 µm 25 to 5 µm, which is smaller than the diameter of most tumour nuclei, and thus the majority of 26 nuclei are not captured completely in the volume of the section [12,13]. 27 To address these challenges, both manual and automated analysis have been improved 28 by using control probes that bind to a specific locus with known copy-number state n ctrl 29 [10]. Two commonly used approaches are: 30 1. Only nuclei containing the expected number of control probes (usually n ctrl = 2) are 31 used to estimate the copy-number of other loci. The underlying assumption is that if a 32 nucleus contains the expected number of control probes then it is likely that the 33 majority of the nucleus is captured by the section and hence other spots will be well 34 represented. 35 2. The spot count for the locus of interest is scaled by the ratio of expected over observed 36 control probe copy-number: In this case, the underlying assumption is that the number of observed control spots is 38 linearly correlated with the number of spots observed for the locus of interest.

39
However, there are significant limitations associated with both of these methods. For 40 example, a tissue section 3 µm thick containing cells with a nuclear diameter of 9 µm will, 41 2/16 on average, have only 41% of each nucleus represented in the section (see Fig 1a).

42
Therefore, for method 1, it is unlikely that the section will contain many nuclei with a 43 complete control probe count and the locus of interest is likely to be undersampled. The goal of the analysis is to estimate the copy-number of a locus denoted by n, which we 68 will achieve by volume-adjusting observed spot counts and using Poisson models.
For a specified section thickness h, we can express the volume of the nucleus sampled by a 79 section in terms of d, the distance of the section edge from the nucleus midline: By integrating over d and dividing by h, we can compute the average volume sampled: This quantity can be used to scale the observed number of spots to get an estimate of the 82 true number of spots:

84
As the observed spot counts are subject to both hybridisation and image signal processing 85 noise, we use a probabilistic model that accounts for this uncertainty. We model the counts 86 as coming from a Poisson distribution with rate λ. Given this, the likelihood of our data can 87 be expressed as To compute the posterior of λ given the data, we use Bayes' rule to transform the likelihood 89 into Using the conjugate Gamma prior as P(λ) and the likelihood of Eq. 7, we sample from the MCMCpack package [17] in R to achieve this. From this sampling chain we then compute 94 the expected rate which is equal to the expected spot count: Poisson Process models a continuous series of events across space or time. In our setting, we 100 consider spots as events and nuclear area a measured in µm 2 as space. The number of spots 101 in an area a is denoted by N (a) and modelled by a Poisson process with intensity λ PP :

Modelling uncertainty in automatic nuclear segmentation
and using the fitPP.fun from the NHPoisson package [18] in R, we obtain a maximum 103 likelihood estimate for λ PP .

104
As λ PP is a spot count estimate per µm 2 of observed nuclea area, to get the estimated 105 number of spots per nucleus, we first multiply by the average area of a nucleus, πr 2 , and 106 4/16 then scale by the average nuclear volume represented in the tissue section, to get an 107 estimate of the number of copies n:

109
To validate and benchmark FrenchFISH, we used the controlled scenario of a simulation 110 study as well as a real-world case study in ovarian cancer. 122 Using these data we tested FrenchFISH's performance against the standard approach  To gain further insight, we observed accuracy and mean absolute error for both 141 approaches under the same varying noise conditions (Figure 2d). Overall accuracy was poor 142 for the standard approach except when the control probe was diploid and true copy number 143 was 1. High accuracy was observed for FrenchFISH up to a true spot count of 4 and noise 144 levels of 10%. Accuracy was poor in cases where over counting noise was 20%. Despite a 145 deterioration in accuracy beyond true copy number counts of 4, FrechFISH's mean absolute 146 error never exceeded 1, thus FrenchFISH's estimates were only ever wrong by one copy. In 147 contrast, the standard approach had a mean absolute error of up to 7 under some conditions. 148 Benchmarking against overlapping nuclei Here we assessed the performance of both 149 methods across simulated tissue sections with varying degrees of nuclear overlap (Figure 3). 150 Both methods were robust to nuclear overlap in the diploid control probe setting, including 151 at 80% probability of overlap. However, the standard approach again showed more variable 152 results as the true copy number increased (Figure 3b). The standard approach consistently 153 failed to estimate the correct copy number when the control probe was not diploid, however, 154 this error did not vary with degree of overlap (Figure 3c). FrenchFISH showed a mean 155 absolute error no greater than one, whereas the standard approach showed up to 2 copies in 156 the diploid control probe setting and up to 7 copies in the non-diploid setting (Figure 3d). 157 158 We performed both manual and automatic spot counting on multichannel FISH of tissue setting also highlighted the difficulty in estimating high copy number states, with accuracy 184 rapidly decreasing with copy numbers greater than 4 copies. However, in all cases tested,

185
FrenchFISH's estimates were not more than 1 copy different from the underlying truth.

186
On ovarian cancer tissue section, 74% of FrenchFISH's automated spot count estimates 187 were within 1 copy of manual counted estimates. This demonstrates that FrenchFISH is a 188 viable alternative to manual counting, which would decrease analysis time fourfold with 189 significantly less human intervention.

190
FrenchFISH is the first method specifically designed to provide quantitative copy number 191 estimates from tissue section FISH without the need for a matched control probe.   Eight high-grade serous ovarian cancer samples were selected and reviewed by a pathologist 212 who marked the area of each tumour on the H&E sections. In addition, four samples from 213 two cases of ovarian squamous cell carcinoma arising in mature cystic teratoma were also 214 selected. Details of these cases (patients 7 and 11) have been published previously [19]. All 215 paraffin blocks were sectioned at 3 µm on positively charged microscope slides. • for each field of view the position in the z-stack with the best focus was detected 240 using Vollath's F 4 measure [20].

241
• The 4 stacks below and 5 stacks above were retained.

242
• A Max Intensity projection was taken across the stacks to generate a single image 243 for further processing.

244
• The contrast of each spot channel was normalised and adjusted, allowing a 245 saturation of up to 40% of the image. This allowed the weaker spot signals to be 246 matched to the stronger, extranuclear noise spots. 247 2. Using R:

248
• Nuclear staining is segmented using the FISHalyzer package.

249
• Spot channel images are masked using nuclear segmentation.

250
• The image is filtered and normalised retaining on the top 10% of signal intensity 251 to remove remaining autofluorescence.

252
• A two stage Gaussian blurring and automatic thresholding approach is applied 253 using the Intermodes [21] method for channels with precipitation signal, and 254 Renyi Entropy [22] method for those without precipitation, found in the 255 autothresholdr package [23]. This combines and removes any small spot 256 artefacts.

257
• A size based filter is applied for final spot segmentation.