Contour, a semi-automated segmentation and quantitation tool for cryo-soft-X-ray tomography

Cryo-soft-X-ray tomography is being increasingly used in biological research to study the morphology of cellular compartments and how they change in response to different stimuli, such as viral infections. Segmentation of these compartments is limited by time-consuming manual tools or machine learning algorithms that require extensive time and effort to train. Here we describe Contour, a new, easy-to-use, highly automated segmentation tool that enables accelerated segmentation of tomograms to delineate distinct cellular compartments. Using Contour, cellular structures can be segmented based on their projection intensity and geometrical width by applying a threshold range to the image and excluding noise smaller in width than the cellular compartments of interest. This method is less laborious and less prone to errors from human judgement than current tools that require features to be manually traced, and does not require training datasets as would machine-learning driven segmentation. We show that high-contrast compartments such as mitochondria, lipid droplets, and features at the cell surface can be easily segmented with this technique in the context of investigating herpes simplex virus 1 infection. Contour can extract geometric measurements from 3D segmented volumes, providing a new method to quantitate cryo-soft-X-ray tomography data. Contour can be freely downloaded at github.com/kamallouisnahas/Contour. Impact Statement More research groups are using cryo-soft-X-ray tomography as a correlative imaging tool to study the ultrastructure of cells and tissues but very few tomograms are segmented with existing segmentation programs. Segmentation is usually a prerequisite for measuring the geometry of features in tomograms but the time- and labour-intensive nature of current segmentation techniques means that such measurements are rarely across a large number of tomograms, as is required for robust statistical analysis. Contour has been designed to facilitate the automation of segmentation and, as a result, reduce manual effort and increase the number of tomograms that can be segmented. Because it requires minimal manual intervention, Contour is not as prone to human error as programs that require the users to trace the edges of cellular features. Geometry measurements of the segmented volumes can be calculated using this program, providing a new platform to quantitate cryoSXT data. Contour also supports quantitation of volumes imported from other segmentation programs. The generation of a large sample of segmented volumes with Contour that can be used as a representative training dataset for machine learning applications is a long-term aspiration of this technique.


62
The biology of cellular compartments has been extensively studied using high-resolution 63 microscopy techniques. Transmission electron microscopy of thin sections of cells stained 64 with heavy metals has been used for decades to produce images of intracellular ultrastructure 65 and can resolve structures at the nanometer level (1) . For precise quantitation, cellular compartments of interest need to be delineated from the other ultrastructural features by 67 segmentation. These features can be segmented manually by tracing the edges of features 68 with Segmentation Editor in Fiji (2) , or with tools such as Amira (Thermo Scientific) that have 69 'intelligent scissors' that predict the boundaries of the object being traced by the user (3) . 70 However, these manual processes are time-consuming and the boundaries of the segmented 71 volumes are prone to human interpretation (4) . Automatic tools exist, but these also have 72 limitations. For example, Bayesian matting, wherein a Bayesain framework is used to 73 delineate foreground objects from the background based on pixel range, is less likely to 74 successfully segment features with textured or thin edges (5) . Similarly, 'magic wand' 75 segmentation, in which pixels of a given range of intensities are segmented if they are all 76 connected, is less applicable to features with a broad range of intensities and where there is 77 high noise in the background (6,7) . Watershed segmentation is often used to separate objects 78 by estimating the boundaries between them based on the distances between their highest 79 intensity maxima. However, the specificity of this technique is low in noisy datasets and can 80 lead to over-segmentation, whereby many small segments are created within a single 81 feature (8,9) . As a result, segmentation tools that use machine learning and deep neural 82 networks to distinguish features of interest from the rest of the ultrastructure have been 83 developed for electron microscopy (e.g. Unet, Ilastik) (10-15) . However, these tools require 84 either a large representative training dataset or modified training for each micrograph. 85

86
The ultrastructural imaging technique known as cryo-soft-X-ray tomography (cryoSXT) has 87 recently become accessible as a tool to cell biologists and pathologists to image the cellular 88 compartments of unfixed whole cells in 3D (16,17) . Moreover, cryoSXT is being used as a 89 correlative imaging technique with cryo-structured illumination microscopy (cryoSIM) to 90 identify features in cellular ultrastructure (18,19) . X rays with a relatively low energy (~0.5 91 keV) (16) , compared with those used for crystallography and medical imaging (~5-30 keV) (20,21) , 92 are used to illuminate the sample and transmission is reduced by absorption through carbon-93 rich structures, such as membranous cellular compartments. As a result, the signal in cryoSXT 94 data appears dark due to X-ray absorption and the background appears light due to X-ray 95 transmission. This technique is used to resolve cellular compartments to a theoretical 96 resolution limit of 25 nm and produce 3D tomograms of whole-cell ultrastructure (17) . CryoSXT 97 imaging of cells and tissues takes 5-20 minutes and thus a large set of tomograms-each 98 containing cellular compartments that need to be delineated by segmentation-can be 99 collected in a relatively short interval (16) . However, segmentation tools to mine information 100 out of X-ray tomograms still need to be developed. One reason for this may be that X-ray 101 tomograms are more difficult to segment than electron micrographs because the use of soft 102 X rays to image the cell volume in 3D under near-native conditions produces higher noise and 103 lower contrast than the heavy metal labelling used in electron microscopy (22) . 104 Although manual segmentation can be used to isolate features of interest, this is more time-105 consuming for 3D datasets that span the entire depth of the field of view within the cell (4) . 106 The development of machine learning tools for cryoSXT data could increase the rate and 107 efficiency of segmentation. However, the resolution, density and morphology of features can 108 vary widely between cryoSXT datasets (e.g. depending on collection date, passage number of 109 cultured cells, sample preparation strategy, etc. (23) ), and this lack of consistency may 110 complicate the use of machine learning tools to segment tomograms. Currently, there is a 111 lack of training datasets for machine learning in the form of segmented volumes from multiple 112 tomograms. SuRVoS has been developed to circumvent the need for training datasets in this 113 form. Instead, individual frames are segmented and used to train segmentation of the whole 114 tomogram (4) . However, this strategy involves training for each tomogram, which is time-115 consuming and does not keep pace with the high rate of cryoSXT tomogram acquisition. 116

117
Here we developed Contour, a semi-automated segmentation tool for cryoSXT. This tool can 118 be used to segment high contrast features in cryoSXT tomograms, such as mitochondria, lipid 119 droplets, and membranous features. This is achieved by a combination of thresholding based 120 on the projection intensity (i.e. darkness) of the features and applying a width restriction 121 based on the size of the features. This automated procedure can be performed globally (i.e. 122 on the entire tomogram). Some features of interest may be excluded due to the strict width 123 restriction, but segmentation of these features can be refined locally in smaller regions of 124 interest. Contour was developed using Python 3.7 and is available for download on Github 125 with example datasets included (github.com/kamallouisnahas/Contour). The segmentation 126 approach used in Contour is faster than manual segmentation tools as it does not require 127 laborious freehand drawing and interpolation like the Segmentation Editor available in Fiji (2) . 128 Extracting quantitative data from cryoSXT datasets is a current challenge and Contour can be 130 used to measure the volume of segmented elements as well as their width along their longest 131 axis. Contour was designed to be used alongside existing segmentation tools: for features that 132 are difficult to segment based on projection intensity and width in Contour (e.g. cytoplasmic 133 vesicles) other segmentation tools can be used to generate segmented volumes that can be 134 imported into Contour for quantitation. We have used Contour in a recent preprint to study 135 how the morphology of mitochondria and cytoplasmic vesicles change during infection with 136 herpes simplex virus-1 (HSV-1) (24) . We generated multiple segmented volumes with Contour 137 and found that mitochondria became more elongated and vesicles reduced in width as the 138 infection progressed (24) . In this paper we discuss the algorithm and applications of this 139 segmentation tool to cryoSXT data. iterations of the SIRT-like filter were applied to limit blurring and signal loss (26) . Mitochondria 152 have a low voxel intensity (high X ray absorbance) compared with the cytosol and an arbitrary 153 threshold range determined by trial and error was used to segment them in a U2OS cell from 154 an 8-bit reconstructed tomogram ( Fig. 1A) (18) . However, segmentation based solely on 155 projection intensity was observed to be highly sensitive to voxel noise and non-specific 156 features, such as the outline of the lipid droplets. In order to increase specificity, an additional 157 segmentation parameter in Contour was used based on the width of the cellular 158 compartments of interest (Fig. 1B). Segmentation was first performed on a complete 159 reconstructed cryoSXT Z stack using the global segmentation algorithm in Contour. The 160 segmentation was later refined in smaller regions using the local segmentation algorithm. 161 162 During global segmentation, the same threshold range applied in Figure 1A was applied to the 163 tomogram in Figure 1B to isolate voxels of the desired intensity and to produce binary masks 164 for each Z image (0 for background voxels and 1 for segmented voxels). A width restriction 165 was determined by manually inspecting the width of the mitochondria and was applied in the 166 second step to exclude noise and non-specific elements smaller in width than the 167 mitochondria, such as the outline of lipid droplets. In order to apply this restriction without 168 the slow process of iterating through each voxel, the binary masks were compressed in a 169 lossless manner by run-length encoding (27) . Using this compression method, the run of voxel 170 values (e.g. 000110000) in the binary mask were compressed into a sequence where the voxel 171 value was coupled to the number of times it appeared consecutively (e.g. (0,3),(1,2),(0,4)). 172 The width restriction was applied to the compressed sequence by converting voxels with a 173 value of 1 to 0 if the number of consecutive voxels was lower than the desired width. The data 174 compression and width restriction were applied twice independently along rows and columns 175 in the horizontal and vertical directions, respectively, and the modified sequences were 176 decompressed into two full binary masks. Voxels segmented within the threshold range were 177 converted into background if their width was less than the width restriction. As a result, the 178 segmented voxels that remained appeared as stripes with a width greater than or equal to 179 the width restriction. The stripes were horizontal or vertical depending on the direction in 180 which the width restriction was applied (Fig. 1B). The arrays of voxels that made up the 181 horizontal and vertical binary masks were multiplied together such that only coordinates that 182 contained a voxel of 1 in both masks (i.e. 1×1) were included in the product segmented 183 volume and all other combinations were converted to background (i.e. 1×0, 0×1, and 0×0). 184 This multiplication step eliminated most noise by ensuring that only rectangular matrices of 185 dimensions width×width or larger remained. In some cases, horizontal and vertical stripes 186 were produced from noise or non-specific features, such as the outline of lipid droplets. 187 Voxels at the intersection between these stripes (i.e. 1×1) were also included after the 188 multiplication step. The run-length encoding, width restriction, and data decompression were 189 reapplied to the product segmented array to filter out these artefacts. The combined 190 application of thresholding and a width restriction results in a better-defined segmentation with less noise and fewer non-specific elements. However, the increase in specificity afforded 192 by width analysis can lead to some desired elements becoming excluded from the segmented 193 volume. In the presented example, the global segmentation step excluded several areas 194 based on the minimum width restriction (Fig. 1C). These areas could be filled by using the 195 local segmentation algorithm in Contour, whereby thresholding and width restriction were 196 applied locally in a smaller 3D region of interest containing these excluded areas (Fig. 1D) 197 using a lower minimal width value (4 voxels). Given that local segmentation is performed on 198 a smaller 3D region of interest, there is no requirement for data compression by run-length 199 encoding before applying width restriction to improve analysis efficiency (27) . 200 201 It is likely that local segmentation will be required following global segmentation. However, 202 global segmentation of the complete Z stack is not required before performing local 203 segmentations. If it is determined that the cytoplasm is too dense with high-contrast 204 compartments to perform a global segmentation, this step can be skipped and local 205 segmentations can be performed on the entire tomogram instead ( Fig. 2A and Table 1). In 206 addition to the global and local segmentation algorithms, manual 'fill' and 'erase' options are 207 available for manual adjustment of the segmented volumes ( Fig. 1 C and D). The segmented 208 volume can be rendered using 3D Viewer in Fiji (2) or other appropriate visualisation software 209 (e.g. Amira (Thermo Scientific) or Chimera/ChimeraX (UCSF) (28) ) (Fig. 1E). widths can be calculated. Any elements smaller in volume than a specified number of voxels 238 can be filtered out and this can be used to eliminate small segments of noise in one step. (C) 239 Final touches can be applied to improve the appearance of the segmented volumes. A 240 smoothing function can be used to smoothen blocky edges in 2D slices and a Gaussian blur 241 can be applied to reduce the appearance of layering in between slices of the segmented 242 volume (Fig. 4). 243 The width restriction was too stringent at this region Apply a local segmentation to this region with a reduced width restriction

Noise elimination
Too many small regions of noise (e.g. <1000 voxels) are present in the segmented volume.
Width restriction parameters were too permissive Noise can be eliminated altogether in one step using the filter function that eliminates segmented elements below a certain volume of voxels. The elements need to be differentiated as a prerequisite.

Appearance of segmented volume
The segmented elements have blocky edges A high minimum width restriction led to large width×width areas being produced in the segmented volume Apply the smoothing function to the segmented volume The segmented elements are too thin in the smoothened segmented volume Too many iterations of the smoothing function were applied, resulting in overtrimming of the edges.
Use fewer iterations (1 to 3 are recommended) Contour lines are visible in a 3D render of the segmented volume The segmented volume was not smoothened or blurred.
Apply the smoothing function to the segmented volume and apply a Gaussian blur.

Applications of Contour to analyse geometry of cellular compartments 246 247
We have shown that mitochondria can be segmented using the global and local segmentation 248 parameters based on their intensity and width (Fig. 1, 2, and 3A). We have used Contour to 249 segment mitochondria in a recent preprint where we studied how mitochondrial morphology 250 changes during HSV-1 infection. We found that mitochondria transitioned from a 251 heterogenous morphology in uninfected U2OS cells to a more consistently elongated and 252 branched formation as the infection progressed (24) . Contour can be used to segment other 253 cellular compartments based on intensity and width, such as lipid droplets (Fig. 3B) and 254 features at the cell surface or at cell-cell junctions, such as large internalisations of the plasma 255 membrane that may resemble bulk endosomes arising from clathrin-independent endocytic 256 events (Fig. 3C) (29) . 257 258 Discrete segmented elements can be differentiated from each other and colour-coded to aid 259 discrimination of the components (Fig.2B and Fig. 4). This is achieved by assigning a common 260 ID number to segmented voxels and their direct-contact neighbours. The inclusion criteria for 261 direct-contact neighbours are any two voxels that are at XY coordinates that differ by one 262 step in any of the eight cardinal (N,S,E, or W) and ordinal (NE, SE, SW, or NW) directions; or 263 any two voxels at the same XY coordinate in tandem Z planes. 264 265 Quantitation of the geometry of cellular features is a current challenge in cryoSXT because 266 segmentation is often a prerequisite and measurements may need to be taken at an angle 267 distinct from the slices of the 3D projection (16) . Contour has the capacity to automatically 268 calculate the volumes of cellular features (in units of voxels) along any axis once the user has 269 differentiated these elements. For example, the mean volume of the mitochondria in a single 270 9.46×9.46 μm 2 field of view of a U2OS cell, given a voxel size of 10 nm 3 , was calculated to be 271 0.3 ± 0.48 μm 3 (mean ± SD; Fig. 3D). The width of each segmented element along its longest 272 axis, which may not be parallel with the slices of the tomographic projection, can also be 273 calculated in this program. This is achieved by isolating the voxels at the perimeter of each 274 segmented element in each image plane and calculating all combinations of the distance (i.e. 275 modulus) between any two of these voxels across the complete Z stack. The longest of these 276 moduli is presented as the width of the segmented element in units of voxels. The longest 277 width of each lipid droplet was calculated for a 9.46×9.46 μm 2 field of view and the droplet 278 width was found to be 1.04 ± 0.51 μm (mean ± SD; Fig. 3E). Segmented volumes generated 279 with other segmentation tools, such as Segmentation Editor in Fiji (2) , can be imported into 280 Contour for quantitation based on the methods described above. After the segmented elements have been differentiated, final touches can be applied to 295 improve the appearance of the 3D volume (Fig. 4). The width restriction applied during the 296 segmentation filters out any segmented voxels that do not form part of a width×width area 297 or larger. As a result, segmented elements may appear blocky. A smoothing function is 298 supplied to smoothen the edges of segmented elements (Fig. 4). Each segmented plane in the 299 Z stack is converted into a binary mask (0 for background and 1 for segment) and is translated 300 by one step in all eight cardinal and ordinal directions and the voxel arrays are added together 301 such that voxels may have a value of 0 to 8. Voxels with less than a median of 5, which occur 302 at the perimeter of segmented elements, were transformed into background, resulting in the 303 trimming of the edges of the segmented elements. A greater number of iterations of this 304 function increase the extent of smoothing but reduce the width of the segmented elements. 305 A compromise of 1-3 iterations is recommended to avoid overtrimming ( We have applied Contour to one study, where we investigated how HSV-1 infection alters the 338 morphology of cellular compartments, and we were able to segment mitochondria in multiple 339 tomograms (24) . The dependency on low projection intensity and width for the segmentation 340 does pose some limitations. For example, some cellular compartments such as mitochondria 341 may have uneven intensities. It is still possible to use Contour for these features, but 342 successful analysis requires a greater number of local segmentations to be carried out with 343 different threshold ranges ( Table 1). The use of a width restriction parameter to distinguish 344 features from noise complicates the application of this technique to thin cellular features, 345 such as cytoskeletal filaments that are normally less than five voxels in width (30) . Cytoplasmic 346 vesicles often have a highly contrasting membrane but a light lumen, making it difficult to 347 segment such features when applying a minimum width restriction. Although we did not use 348 Contour to segment cytoplasmic vesicles in our recent study (24) , we used Contour to calculate 349 the longest widths of each vesicle that we manually segmented using Segmentation Editor in 350 Fiji (2) . We therefore show that Contour can be used in conjunction with other segmentation tools to calculate quantitative data. Our semi-automated segmentation tool could be used to 352 generate sufficient segmented volumes of different cellular compartments to facilitate 353 training of machine learning algorithms in the future. CryoSXT is a growing technique and its 354 applications are becoming more widespread in biomedical imaging, especially as a correlative 355 imaging tool with cryoSIM (17)(18)(19) . Contour is a largely automated segmentation tool designed 356 to keep up with the pace of tomogram acquisition and to provide a new method for 357 quantifying tomographic data. penicillin/streptomycin (10000 U/ml; Thermo Fisher Scientific, Cat# 15070063). 2 µL of gold 372 fiducials (BBI Solutions; EM.GC250, batch 026935) were added to the grids as previously 373 described (18) and the grids were blotted with for 0.5-1 s at 30°C and 80% humidity with a Leica 374 EM GP2 plunge freezer. The grids were plunged into liquid ethane and then transferred into 375 liquid nitrogen. The tomograms presented in this paper were collected for a study of the 376 effect of HSV-1 infection on the morphology of cellular compartments in U2OS cells (24) . All 377 tomograms shown here were collected from uninfected cells except for Fig. 3B, which was 378 collected from a cell infected with 1 plaque forming unit per cell of HSV-1 as previously 379 described (24) . 380 381