Concept and in-silico assessment of an algorithm for monitoring cytosolic fluorescent aggregates in cells

Autophagy is an evolutionary conserved pathway, by which eukaryotic cells degrade long-living cellular proteins and intracellular organelles, to maintain a pool of available nutrients. Impaired autophagy has been associated to important pathophysiological conditions, and this is the reason why several techniques have been developed for its correct assessment and monitoring. Fluorescence microscopy is one of these tools, which relies on the detection of specific fluorescence changes of targeted GFP-based reporters in dot-like organelles in which autophagy is executed. Currently, several procedures exist to count and segment this punctate structures in the resulting fluorescence images, however, they are either based on subjective criteria, or no information is available related to them. Here we present the concept of an algorithm for a semi-automatic detection and segmentation in 2D fluorescence images of spot-like structures similar to those observed under induction of autophagy. By evaluating the algorithm on more than 20000 simulated images of cells containing a variable number of punctate structures of different sizes and different levels of applied noise, we demonstrate its high robustness of puncta detection, even on a high noise background. We further demonstrate this feature of our algorithm by testing it in experimental conditions of a high non-specific background signal. We conclude that our algorithm is a suitable tool to be tested in biologically-relevant contexts.

quenched in the acidic environment following fusion of autophagomes with lysosomes. 23 To overcome this problem an improved autophagy indicator has been proposed, which 24 in addition to GFP includes a less-acid sensitive red-emitting fluorescent protein, which 25 signal can be observed in autolysosomes even after the disappearance of that one from 26 GFP [6]. 27 The analysis of the images obtained with fluorescent protein-based indicators 28 requires a reliable procedure for the segmentation and counting of cytosolic fluorescent 29 spots. This step is typically performed either manually, using commercially available 30 tools (Top Hat of MetaMorph, or G-Count by G-Angstrom, [1]), or the ImageJ plugin 31 WatershedCounting3D [7]. Unfortunately, information on how is segmentation carried 32 out could not be found (G-Count, WatershedCounting3D), or is subjective and depends 33 on the person conducting this analysis (manual analysis and the Top Hat algorithm). 34 Moreover, we could not find evaluations of the segmentation accuracy of every of these 35 algorithms. 36 Here we propose an algorithm for the identification of 37 autophagosome/autolysosome-like cytosolic spots using a dynamically-calculated 38 threshold value. Using more than 20 000 simulated images with different noise levels 39 and variable number and size of spots, we analyzed the performance of the algorithm, 40 and show the robustness of meaningful signal detection even in conditions of very high 41 background noise. To further test the algorithm performance, we applied it to analyze 42 images obtained in experiments with cells which expressed a tandem indicator in the 43 cytosol, i.e. displaying a higher non-specific cytosolic signal, that in conditions when the 44 indicator is directly targeted to the autophagosomes. Two-three days after transfection 45 cells expressing the cytosolic indicator displayed fluorescent punctate structures which 46 were successfully detected and segmented by the algorithm, even on a background of 47 high cytosolic fluorescence.  Simulated images were set to contain from 20 to 50 fluorescent spots of sizes between 82 0.1 and 1.2 µm, similar to data shown in [1,4,8].

83
The simulated images were generated using the approach proposed by Sinko and 84 collaborators in [9]. The next algorithm resume the steps for the simulated images  [10]. For this step we used the plug-in "Diffraction PSF-3D" of ImageJ. 3 Add Gaussian noise in order to consider auto-fluorescence and Poisson noise for the purpose of considering electronic noise. 4 Finally, the image is multiplied by √ 2 to consider amplification in a EM-CCD detector [9].
In the step 2 of the Algorithm 1 we consider the following parameters for the image 87 convolution: Index of refraction of the media n = 1.33, Numerical Aperture Width (pixels) = 512, Height (pixels) = 512. Rayleigh resolution 90 r = 0.61λ/N A = 240 nm. For more details of this plug-in see [10].

91
In Fig. 1 the steps of the Algorithm 1 for the generation of simulated images is 92 visually represented. The Fig. 1(d) can be considered as an example of the images that 93 were used to the validation of the proposed algorithm.
where M (m, n) is the image set of dimension m × n.

98
Puncta detection: For a cell C k ∈ {C 1 , C 2 , . . . , C K } of the image sequence I in a 99 time t, we denote this cell as C I tk and then perform the next steps: 100 1. Filter the cell : This step consists in thresholding the image I C I tk (denotes the image that contain only the cell C I tk ) to remove noise and atypical (low-intensity) values. The threshold ( ) is computed dynamically in dependence of the characteristics of each images, and as consequence, offers an adaptive way of elimination of noise and atypical values for each images (a detailed explanation is available in Section 1.5.1). The filtered image is defined as: 2. Compute the connected components: The connected components labeling is 101 used to detect connected pixels, corresponding to a single lysosome/puncta, 102 in the image I (i, j). For this, each pixel of the image is visited, and then external boundaries are created using pixel neighbors according to a specific 104 type of connectivity [12]. The results of this step are N connected 105 components A i , i = 1, 2, . . . , N , where each A i is identified as a single 106 lysosome/puncta. In this step it is assumed that cytosolic fluorescent 107 aggregates are composed of several pixels, and a digital path exists between 108 them. From the point of view of digital image processing, the entropy corresponds to the average of the information contained in the image, and is defined as: where B is the total number of bits of the digitized image I and p(x) = Kx m×n is the probability of occurrence of a gray-level value, K x is the number of times that the pixel with a value x is observed in I, m and n are the numbers of rows and columns of the image respectively, and as consequence m × n is the total number of pixels in the image. The minimum of entropy is achieved when the image itself is constant (all pixels have the same value) and in this case E(I) = log 2 1 = 0. The maximum entropy is reached when the pixel's values follow a uniform probability distribution function, this is, if B is the number of bits by pixel, then exists 2 B gray values in the image and the probability of each of them is exactly: If we consider the definition of entropy (3) and the condition (4), then: The above information provides the limits for the entropy function of a digital image of B bits per pixel: Considering the entropy function as a measure of the information contained in an Definition 1 (Entropy Information Criterion (EIC)) The entropy information criterion is defined as: where E(I) denote the entropy of the image I and B is the number of bits per pixels in 118 I.

126
The p-th percentil is obtained by calculating the ordinal rank, and then taking from the ordered list the value that corresponds to that rank. The ordinal rank n is calculated using the formula: where N are the total number of observations, p ∈ [0, 1] is the percent of the data less than Y (p). The operator · is the integer part, and is defined as: The value from the ordered list that corresponds to the ordinal rank is the percentile, The combination of the EIC (see Definition 1) and the computation of the percentile 129 provide a clear and easy way to determine the threshold of any image.

130
Definition 2 (Entropy Threshold Criterion (ETC)) Let I be an image with a colour depth of B bits per pixel, the Entropy Threshold Criterion of I is defined as: where EIC I is the "Entropy Information Criterion" defined in Definition 1 and Y (·) is 131 the EIC I percentile of the image pixels values.

132
The ETC provides a fast, simple and efficient alternative to compute the threshold 133 value for an image of any size and resolution. Also, it can be easily implemented in any 134 programming language and allows to calculate in a dynamic way the threshold value for 135 image sequences.

136
Applying this procedure to fluorescence imaging, we can automatically select in the 137 image those pixels with the highest intensity values. The threshold value defined in 138 the equation (2) is computed using the Definition 2. In the analysis of the performance of algorithm, it is important to take into account the relationship between the level of signal versus noise power. According to Gonzalez and Woods [14], if we denote by R(x, y) the image that only contains the signal and by C(x, y) the corrupted image (containing both signal and noise), then, the signal/noise ratios (SNR) can be computed as: Then, replacing (8) and (9) in the equation (7) we obtain: The images R(x, y) and C(x, y) have a size of m × n pixels. The SNR is expressed in 141 decibels (dB) and higher numbers correspond to better contrast, since there is more 142 useful information (the signal) than unwanted data (the noise). For example, when an 143 image have a SN R = 2dB, it means that the signal is 2 times higher than the level of 144 the noise. Note that with this definition of SNR it is possible to obtain negative values 145 of SNR, this situation occurs when the image contains more noise than signal.

False Positive (FP):
The organelle spot L S j ∈ L S I is a false possitive if: False Negative (FN): The organelle spot L j ∈ L I is a false negative if:

7/14
In each of the previous definitions (x Li , y Li ) and x L S j , y L S j denote the center of 158 mass of the organelles L i ∈ L I and L S j ∈ L S I respectively. Note that in each case 159 (x L − x L S ) 2 + (y L − y L S ) 2 is the euclidean distance between the center of mass of 160 the organelles L i and L S j .

161
Similarly to the work of Garcés and collaborators [15], the 0.5µm threshold was 162 chosen based on the resolution limit of optical microscopes, which in agreement with 163 Rayleigh's criteria, is r = (0.61λ)/N A, with λ being the emission wavelength of the 164 fluorophore and N A the numerical aperture of the objective. For the sake of simplicity, 165 we define a 0.5µm threshold as an approximation to the diameter of the zero-order Airy 166 ring, because in diffraction-limited images it does not make sense to segment structures 167 smaller than 2r.

168
The Fig. 2 represents the criteria described previously. A TP is returned when an 169 organelle is detected within a distance shorter than 0.5µm to that of a simulated spot. 170 A FP result is returned when the algorithm detects an organelle that does not exist in 171 the "ground truth" image. A FN is obtained when a organelles in the "ground truth"   The Jaccard index [16] is a measure of similarity for two sets of data, with a range from 0 to 1. The higher the value, the more similar the two sets are, and as consequence, is very easy to interpret. If we consider the sets L I and L S I , the Jaccard index is defined as: in our specific case, and considering the criteria presented before, we obtain that: where T P, F P and F N indicate the total number of true positive, false positive 176 and false negative respectively.

177
The Precision [17] (positive predictive value) is the fraction of organelles that were well detected (TP) among all the organelles found by the algorithm (TP+FP). It is defined as: The Recall (true positive rate or sensitivity) [17] in the context of our application is the fraction of organelles that have been detected by the algorithm over the real number of organelles in the "ground truth" image. It is defined as: The Precision and Recall take values in the interval [0, 1]. The higher values indicate a 178 better accuracy of the algorithm.

179
The performance of the algorithm was evaluated on more than 20 000 simulated approximately 97% of the organelles in the "ground truth" images were detected by the 193 algorithm for all the SNR. In general, the algorithm show a high robustness to noise, 194 even in cases when noise is much higher than the signal in the images (SN R ≤ 0dB).

195
The values obtained show that our procedure offers a very good detection level of the 196 organelles in the original images.  One of the advantages of the algorithm is the possibility to compute some properties 198 like the area, the mean intensity, the perimeter and the spatial position of each detected 199 PLOS 9/14 spot corresponding to an organelle. We use the information about the area of each 200 "ground truth " organelle to study the error in the adjustment of each organelle, so, we 201 can use this information to analyze the performance of the algorithm in fitting the real 202 size of a organelle.

203
Note 1 If L S ∈ L S I is a true positive detected organelle, then by definition (see the criteria above): then it is possible to compare the area of the organelles L i and L S . The error in the area of the organelles L and L S is defined as: where |·| denote the absolute value.  (12)) in function of the SNR. Even for the images with the highest noise values 206 (SN R ≈ −7.8dB) the error in the fitting is less than 0.1µm 2 , which is a very small 207 error and constitute a clear evidence that out algorithm have a good performance in the 208 adjustment of the organelles. As expected, the distribution functions of the area error 209 decreases while the level of noise is less in the images, the error in the area 210 approximation is lower (note that when the SN R = 3.88dB the maximum error 211 obtained in our simulation was around 0.08µm 2 ). Fig. 4(b) reveals a very small values 212 for the error of the displacement between the center of the organelles. In this case the 213 difference of the distribution functions between the lowest and highest values of SNR is 214 more pronounced than in Fig. 4 [20] (Fig. 6A). As shown in studies by other researchers, the low pKa 239 (pKa < 4.5, [21]) and proteolysis-resistance of mCherry, inherited from the parental 240 RFP, for which such properties have been demonstrated [1,20], was responsible for the 241 appearance of the bright red-emitting cytoplasmic puncta [22]. At the same time, 242 lysosomal sequestration of the mCherry-cpVenus construction resulted in almost 243 complete absence of yellow-green fluorescence in this acidic compartment, as expected 244 from the conditions present in the lumen of these organelles (pH 4.5, [23]) and the 245 relatively high pKa of Venus (pKa = 6.0, [24]), making it sensitive to low pH (Fig. 6B). 246 We used images obtained in these conditions to test the algorithm performance. An 247 example of manually isolated cells which were further automatically segmented by our 248 In the present work we introduced a segmentation algorithm for the analysis of punctate 253 structures similar to those observable in fluorescence images from studies on autophagy. 254 The evaluation of the algorithm segmentation accuracy on more than 20 000 images,