Label-free three-dimensional analyses of live cells with deep-learning-based segmentation exploiting refractive index distributions

Visualisations and analyses of cellular and subcellular organelles in biological cells is crucial for the study of cell biology. However, existing imaging methods require the use of exogenous labelling agents, which prevents the long-time assessments of live cells in their native states. Here we propose and experimentally demonstrate three-dimensional segmentation of subcellular organelles in unlabelled live cells, exploiting a 3D U-Net-based architecture. We present the high-precision three-dimensional segmentation of cell membrane, nucleus membrane, nucleoli, and lipid droplets of various cell types. Time-lapse analyses of dynamics of activated immune cells are also analysed using label-free segmentation.


Introduction 10
There is a high demand for the quantification of the morphological dynamics in a live cell and its subcellular 11 organelles among numerous research topics in quantitative cell biology 1, 2 . Recent advances in microscopic 12 techniques have created a new era for image-based cell volume quantification 3-5 . Fluorescence-based confocal 13 imaging is the most popular for live-cell quantification, offering high flexibility of organelle markers and 14 correlated fluorophores. 15 Quantitative phase imaging (QPI) is a powerful method to observe the morphology of a live specimen without 16 any perturbation; this includes dye staining or fluorescence protein expression 6 . Recently developed three 17 dimensional (3D) QPI techniques provide the 3D refractive index (RI) distributions, containing quantitative 18 information on the concentration of a material, and have been exploited in various applications including 19 biomolecular condensates 7 , biotechnology 8 , microbiology 9 , and cell biology 10 . Although the 3D QPI image can 20 provide the physical properties corresponding to each voxel, a universal and versatile segmentation method is 21 required to simultaneously monitor quantitative dynamics in a whole cell and its organelles. To this end, there is 22 a need for techniques to discriminate specific organelles within a cell and discriminate each cell unit from its 23 neighbouring cells. 24 To provide such a cell segmentation mask in 3D QPI, previous works have widely used conventional approaches 25 such as the threshold-based Otsu segmentation, transforming a 3D image to a two-dimensional (2D) image by 26 maximum intensity projection, and filtering in 3D volume organelle segmentation [11][12][13] . However, this algorithm 27 may rarely be applied to the organelle segmentation of QPI images due to the lack of organelle specificity from 28 the intensity and the low variation of numerical contrast. As the RI is an intrinsic value determined exclusively 29 by the concentration of a certain material, the RI range can easily overlap among different compartments within 30 a cell. 31 In recent years, machine learning techniques based on cell and organelle morphology have been adopted to 32 overcome these problems in 3D QPI imagery. In particular, deep-learning approaches based on a large amount 33 of data rather than specific features have been utilised, including nucleus segmentation 14 , spermatozoon 34 segmentation 15 , and lipid droplet segmentation 16 . These semantic segmentation methods should use single-cell 35 images for cell analysis due to the absence of a method to distinguish individual cell units. To overcome this 36 limitation of semantic segmentation in cellular studies, several studies have proposed cell-by-cell segmentation 37 to track the immunological synapse of immune cells 17 or analyse sperm cells 15 . However, as these methods 38 mainly focus on the segmentation of specific cell types or organelles, their applicability is limited in a few 39 analyses. To be used in various applications, it is necessary to develop a robust model that accurately segments 40 individual cells and organelles among numerous cell types. 41 This study presents a universal framework for the label-free, quantitative analysis of live cells; this study has 42 three major contributions. First, to simultaneously monitor the quantitative dynamics of whole cells and 43 organelles, an automatic segmentation framework using deep learning and cell characteristics in 3D QPI images 44 was proposed. The proposed automated segmentation framework consists of a "multi-organelle segmentation" 45 model that segments multiple organelles within a cell and a "cell-by-cell segmentation" model that distinguishes 46 individual cells from neighbouring cells. Second, we verified that this model has spatio-temporal robustness 47 among numerous adherent and suspension cell lines, popular among biologists. The proposed framework did not 48 target a specific organelle, but rather it learned the relationship of organelles within a cell, by considering 49 multiple organelles simultaneously. As such, it showed stable performance even within a variety of cells not 50 used for learning. In particular, the cell-by-cell segmentation model operated in various cell lines without being 51 limited to specific cells, based on the cell membrane and nuclear information. Finally, we demonstrate 52 quantitative analyses of RAW 264.7 cells utilising morphological and biochemical properties by exploiting the 53 linear correlation among RI, protein density, and the proposed segmentation models. The results suggest that the 54 proposed method offers a new analytical approach for automatic cell studies. 55

Deep-learning-based multi-organelle and cell-by-cell segmentation model 2
The proposed analysis process for live cells was primarily composed of two processes. First, we generated 3 segmentation masks for individual cells and their organelles. Then, we obtained the morphological and physical 4 properties (e.g., volume, surface area, and concentration), using each created segmentation mask and its RI 5 values. 6 To this end, we utilised data-driven, deep-learning techniques. Specifically, we used two different 3D 7 convolutional neural networks: one for the multi-organelle segmentation model, and the other for the cell-by-8 cell segmentation model, as depicted in Figure 1. The multi-organelle segmentation model predicts the 9 segmentation mask of four organelles from the input 3D RI tomogram ( Figure 1a); the nucleus, nucleolus, 10 plasma membrane, and lipid droplet ( Figure 1b). We selected these four organelles because they are commonly 11 used in cell analysis. By learning various tasks simultaneously, the model learns the characteristics of individual 12 organelles and their relationship with each other. This multi-task learning prevents overfitting 18 and significantly 13 reduces computation time, compared to training each task separately. 14 The cell-by-cell segmentation model divides the membrane mask of the entire cell into the mask of each cell. 15 The model uses the nuclear and membrane masks obtained from the multi-organelle segmentation model results 16 ( Figure 1c). Assuming that each cell has at least one nucleus, a nucleus mask was used as the seed to separate 17 individual cells. The membrane mask was used to distinguish regions between non-cell and cell areas. The 18 details of the model are described in the Online Methods section. 19 We measured 129 3D QPI images of live NIH3T3 cells to train and evaluate the segmentation models described The Dice score was used to quantitatively measure the segmentation performance of the model. This is the most 32 frequently used metric in image segmentation, quantifying the similarity between the ground truth and 33 prediction masks (Eq. 3). The average Dice score of cell instance segmentation for the five cells was 0.758, 34 while the average Dice score of membrane segmentation was 0.831. For the NIH3T3 and RAW 264.7 cells, the 35 difference between the Dice score of membrane segmentation and that of cell-by-cell segmentation was 36 marginal. However, for the remaining cells, the Dice score of cell-by-cell segmentation was slightly lower than 37 that of membrane segmentation. As these cell lines were characterised by confluent growth, the resulting cell-38 by-cell segmentation tasks were very difficult. Likewise, the Dice scores of the nucleus segmentation for the 39 NIH3T3 and Raw 264.7 cells were higher than the remaining cells. The variation in the Dice score for nucleolus 40 segmentation was relatively small; this is because the RI of the nucleolus is similar between cells. The Dice 41 score of the lipid droplet segmentation was far lower than other organelles, as the volume of the lipid droplet 42 was relatively small compared to the other subcellular organelles. A small portion of false-positive and false-43 negative predictions may significantly reduce the Dice score when the total volume of the mask is small. 44 Next, we conducted a qualitative assessment using experts ( Figure 2). The cell-by-cell segmentation model 45 performed well in the A549, MDA-MB-231, HeLa, and RAW 264.7 cell lines, which were not used for training. 46 Each of the five cell lines had different shape characteristics in the subcellular organelles. NIH3T3 has a small 47 apparent nucleus, a small number of nucleoli, and a long and overlapping membrane structure. The A549 cells 48 also had an evident trim nucleus, although they possessed one or two large nucleoli, and had thin membrane 49 with large lipid droplets. The HeLa cells have a large prominent nuclear membrane and a small number of 50 nucleoli in the nucleus, with a thicker membrane than A549 cells. The MDA-MB-231 cells were possessed, but 51 it has one or two large nucleoli. The morphology of RAW 264.7 cells completely differed from that of the four 52 other cell lines; the size of RAW 264.7 cells was smaller than that of the other five cell lines; moreover, the 53 RAW 264.7 cells had a spiky circular membrane. The size of the nucleus was sufficiently large to make up most 54 of the cells. We applied the model to cell lines with different characteristics and observed that it worked very 55 well with various cell lines. Additionally, the masks produced by 3D cell segmentation showed better 56 morphological features of cells and subcellular organelles ( Figure 3). 57

Quantitative cell analysis using segmented masks 58
To analyse live cells, we utilised volume, surface area, and concentration. Volume was computed by multiplying 59 the total number of voxels in the segmentation masks and the volume of the 3D QPI image. To compute surface 60 area, we constructed a triangular mesh from the 3D segmentation masks. Concentration was calculated as per 61 where is the index of the set of the segmentation mask; is the RI of the voxel; 0 is the RI of the 64 surrounding media (1.3337); the RII is a constant set as 0.135 for the lipid droplet and as 0.19 for the remaining 65

organelles. 66
Macrophages are white blood cells that play an essential role in the innate immune system. Macrophages  Figure 2). We attempted to track and calculate the changing parameters 86 of RAW 264.7 cells for 8.5 h during the activation process as these RAW 264.7 cells dynamically altered their 87 shape during the initial response to LPS treatment. 88 The 3D RI tomogram of LPS-treated RAW 264.7 and untreated control cells were acquired every 30 min for up 89 to 8.5 h on a 3D QPI microscope in label-free states. The 3D cell segmentation was conducted with every time-90 lapse image, generating subcellular organelle masks (Figure 4a). The generated masks represent the changing 91 phenotype of the activating macrophage process. The membrane masks perfectly represented the spreading 92 morphology of activated macrophages overtime in 3D. Figure 4a shows that the verified volume of activated 93 macrophages had become bigger and wider through the membrane mask. In addition to the xy slices, the yz, and 94 xz slices enabled easy identification of the increased volume of the cell membrane through the generated mask. 95 The subcellular organelle masks of the nucleus and nucleolus and lipid droplet were acquired from label-free 96 holographic imagery. Although the size of the nucleus appears to grow along the cell membrane, retention trends 97 were observed in the nucleolus compared to the membrane and the nucleus. The most noticeable changes were 98 increased, it was confirmed that the volume and area of the nucleolus slightly increased at the beginning, and 112 was maintained for 8.5 h. For the lipid droplets with very high RI compared to other subcellular organelles (e.g., 113 the nucleus and nucleolus), it was observed that the RI was maintained during the activation process for 8.5 h. 114 The surface area and volume of lipid droplets increased during macrophage activation compared to the control 115 cells, similar to the membrane and nucleus. These observations suggest that LPS-induced changes in 116 macrophages occur phenotypically, and manifest in physical changes. We compared the individual masks of 117 subcellular organelles from the start point to end point of time-lapse data (Figure 4c). The mean RI of each 118 organelle decreased rapidly for 8.5 h, although the volume and surface area increased. The results indicate that 119 RAW 264.7 cells were increasing in size while losing their concentration during the activation process. This 120 occurred throughout the whole cell and in subcellular organelles, including the nucleus, nucleolus, and lipid 121 droplets. 122 123

124
The results show that the proposed framework, combining 3D QPI with deep neural network-based 125 segmentation models, enables label-free 3D live cell analysis in an automated manner. The proposed framework 126 predicts the segmentation mask of organelles within individual cells and uses RI to provide physical and 127 morphological information on cells and organelles. In particular, to automatically segment each cell, we 128 assumed that each cell had at least one nucleus. We used the predicted nucleus segmentation mask of the cell as 129 seed information to distinguish each cell. The proposed framework did not target specific organelles. Rather, it 130 predicted multiple organs concurrently. By training the model to segment several organelles simultaneously, the 131 model learned the relationship of organelles and showed stable performance even in various cell lines not used 132 for training. We also demonstrated that existing biological knowledge may be confirmed through the proposed 133 framework by automatically tracking and observing cell dynamics in time-lapse data. 134 To the best of our knowledge, this work is the first of its type to analyse various 3D cell organelles The multi-organelle segmentation model uses a 3D RI tomogram image as input and predicts the binary mask of 148 four organelles: the nucleus, nucleolus, membrane, and lipid droplets (Supplementary Figure 1). We train the 149 model to predict the mask of four different organelles simultaneously, as opposed to training an independent 150 model for each organelle. This approach, known as multi-task learning, improves the overall performance of 151 multiple tasks 18 . In addition, the model training time was drastically reduced by using only a single model. 152 3D RI tomogram images have high resolutions, varying from 100×600×600 to 260×860×860 voxels. As such, 153 training the subcellular organelle segmentation model using the entire volume requires a huge graphics 154 processing unit (GPU) memory. For this reason, during the training phase, we initially resized the input 3D RI 155 tomogram to 128×512×512 voxels. Then, we randomly sampled patches of 64×128×128 from the resized 3D 156 RI tomogram and utilised these as inputs. The model predicted the probability map for each subcellular 157 organelle producing an identical resolution to the input patch. 158 During the inference phase, we resized the input to 128×512×512 voxels and applied a 64-size symmetric 159 padding. We cropped the input image from the centre along the z axis with the size of 64. Then, we uniformly 160 generated patches of 64×128×128 with a stride of 128 and obtained probability maps for each patch. We 161 reconstructed the predicted patches into the entire volume of the image by stitching patches using a spline 162 kernel. Following this, we removed the padding area from the stitched probability maps and restored them to the 163 original resolution. Finally, we obtained the segmentation mask by binarising the probability map using a 164 threshold of 0.5. 165

Cell-by-cell segmentation 166
To predict the instance masks of cells, the cell-by-cell segmentation model utilises segmentation masks of the is the total number of nuclei. As the nuclei of cells were separated, the instance masks of the 172 nucleus were simply obtained using a connected component algorithm. Then, we selected the th nucleus 173 instance mask, , and considered this a positive map, ; the remaining nuclei instances were considered a 174 negative map, . We concatenated the membrane mask, , and and to , predicting the 175 instance probability map (x i ) of the cell that includes the selected nucleus instance (n i ). 176 During the training phase, we randomly selected one nucleus instance and trained the model to predict the 177 instance mask of the selected cell. During the inference phase, the model repeated this process for each nucleus 178 and finally obtained the instance mask by assigning the index of the nucleus that has the highest probability: 179 , where is the number of nuclei. We considered a voxel as the background if the 180 highest probability was lower than the 0.5 threshold. 181 In contrast to the subcellular organelle segmentation task, the patching strategy was not applicable to the cell-182 by-cell segmentation task as the whole-cell shape was critical when predicting the instance mask. Therefore, we 183 downsized inputs to 128×128 in the x and the y axes, and cropped the resized inputs along the z axis to the size 184 of 64. Then, we restored the predicted instance mask to the original size of the inputs. 185

Network architecture and training details 186
The 3D U-Net-based architecture was adopted 26 , and this was demonstrated to have impressive performance in 187 biomedical image segmentation tasks, as per these models. Specifically, we employed the Scalable Neural We selected the activation and normalisation functions of ScNas as Leaky-ReLU 28 and instance normalisation 29 , 197 respectively. The size of the initial feature map, the number of layers, and the feature map multiplier were set to 198 12, 8, and 3, respectively. Hyper-parameters of the network were adjusted using a grid search algorithm. The 199 models were implemented in Python 3.7, using the PyTroch 1.4 framework on an 8 V100-32G GPU machine. 200 Several data augmentation strategies were applied, such as random flipping, cropping, and rotation. In 201 particular, the input image was rescaled from 0.5 to 2 to handle varying resolutions of 3D RI tomography. We 202 utilised the Adam 30 optimiser with the learning rate of 0.001, and reduced the learning rate by the factor of 5 if 203 there was no improvement in the validation metric for 30. To train the models, we combined the Dice loss and 204 the binary cross entropy (BCE) loss; this is defined as: 205 where is the number of voxels; and and indicate the predicted probability and the ground-truth label 207 of the th voxel, respectively. For the multi-organelle segmentation task, we trained the model to conduct 208 multiple tasks simultaneously; thus, the loss for each subcellular organelle was calculated and the sum of all 209 losses was determined. For cell-by-cell segmentation, we simply computed the loss between a selected instance 210 mask and its prediction. 211

Metrics 212
For quantitative evaluation, we adopted a Dice coefficient that measures the similarity between the predicted 213 mask and the corresponding ground-truth mask; this coefficient is defined as: 214 where is the ground-truth mask; and is the predicted mask. For the cell-by-cell segmentation task, the 216 Dice coefficient score, ( , ), was determined for all pairs of instance masks associated with the 217 prediction and the ground-truth. Then, we applied the Hungarian algorithm 31 to assign the prediction ( ) to 218 ground-truth ( ), which had the highest Dice score.

LPS treatment in RAW 264.7 cells and plating on TomoDish 227
Precisely, 30 µL of LPS (100 µg/mL stock, List Biological Laboratories) from Escherichia coli was added to 3 228 mL of Dulbecco's Modified Eagle Medium supplemented with 10% FBS and 1% antibiotic-antimycotic; 229 3.0×10 5 RAW 264.7 cells, a mouse macrophage cell line, were counted and added to a 15 mL tube, and cells 230 were centrifuged at 100 × g for 5 min to collect cell pellets. The supernatant medium was removed using 231 suction, and the cell pellet was gently resuspended in 3 mL of LPS-containing medium. Then 3 mL of medium 232 with RAW 264.7 cells were moved to the TomoDish (Tomocube, Inc.). 233 The 3D QPI 234 The 3D RI images of cells were obtained using a commercial holotomography (HT-2H, Tomocube Inc., 235

Republic of Korea), based on Mach-Zehnder interferometry equipped with a digital micromirror device (DMD). 236
A coherent monochromatic laser ( = 532 nm) was divided into two paths, a reference and a sample beam, 237 using a 2×2 single-mode fibre coupler. A 3D RI tomogram was reconstructed from multiple 2D holographic 238 images acquired from 49 illumination conditions, a normal incidence, and 48 azimuthally symmetric directions 239 with a polar angle (64.5°). The DMD was used to control the angle of the illumination beam impinging on the 240 sample 32 . The diffracted beams from the sample were collected using a high numerical aperture (NA) objective 241 lens (NA=1.2, UPLSAP 60XW, Olympus). To compensate the missing cone issue due to the limited NA, a 242 regularization algorithm based on non-negativity was used 33 . The off-axis hologram was recorded using a 243 complementary metal oxide semiconductor image sensor (FL3-U3-13Y3MC, FLIR Systems). The visualisation 244 of the 3D RI maps and its correlative 3D fluorescence signal with red pseudo-colour was carried out using 245 commercial software (TomoStudio TM , Tomocube Inc.). The details on the principle and reconstruction 246 algorithms can be found elsewhere 34,35 . 247

Time-lapse imaging using a holotomography microscope 248
Prior to 3D QPI imaging, the HT-2H (Tomocube, Inc.) was turned on to warm up the laser for at least 30 min. 249 Additionally, the carbon dioxide (CO2) gas mixer and temperature controller were turned on to maintain a 250 temperature of 37°C and an atmosphere with 5% CO2 in the TomoChamber (Tomocube, Inc.