Label-free multiplexed microtomography of endogenous subcellular dynamics using generalizable deep learning

Simultaneous imaging of various facets of intact biological systems across multiple spatiotemporal scales is a long-standing goal in biology and medicine, for which progress is hindered by limits of conventional imaging modalities. Here we propose using the refractive index (RI), an intrinsic quantity governing light–matter interaction, as a means for such measurement. We show that major endogenous subcellular structures, which are conventionally accessed via exogenous fluorescence labelling, are encoded in three-dimensional (3D) RI tomograms. We decode this information in a data-driven manner, with a deep learning-based model that infers multiple 3D fluorescence tomograms from RI measurements of the corresponding subcellular targets, thereby achieving multiplexed microtomography. This approach, called RI2FL for refractive index to fluorescence, inherits the advantages of both high-specificity fluorescence imaging and label-free RI imaging. Importantly, full 3D modelling of absolute and unbiased RI improves generalization, such that the approach is applicable to a broad range of new samples without retraining to facilitate immediate applicability. The performance, reliability and scalability of this technology are extensively characterized, and its various applications within single-cell profiling at unprecedented scales (which can generate new experimentally testable hypotheses) are demonstrated. Jo et al. develop a broadly applicable deep-learning approach to predict fluorescence (FL) based on label-free refractive index (RI) measurements, ‘RI2FL’ (RI to FL). The trained model can be used across cell types without retraining.

I maging is the process of mapping a variable, called contrast, in space and time. The trade-offs between different contrast mechanisms fundamentally determine the distinct characteristics of each imaging modality 1 . In biomedicine, fluorescence (FL) has been used as a canonical optical imaging contrast for several decades to visualize specific elements within biological systems and is powered by chemical, immunological and genetic labelling strategies 2 . Despite the excellent biochemical specificity, however, a number of drawbacks are associated with FL. These include photobleaching and phototoxicity (which limits the temporal window), spectral overlap (which limits multiplexing), variability of labelling quality (which limits reproducibility and potential bias) and exogenous labelling-induced side effects (which may lead to perturbation of endogenous biology).
Recent advances in machine learning have triggered an interesting approach known as cross-modality inference or in silico labelling. This technique achieves FL contrast by measuring another contrast with complementary characteristics [3][4][5][6][7][8][9][10] . Notably, the pairing of images from simultaneous FL and bright-field (BF; based on absorption contrast) or differential interference contrast (DIC; based on phase contrast) microscopy was used to train neural networks to convert BF or DIC images into FL images 3,4 (termed BF2FL and DIC2FL, respectively). Despite the ingenuity of these initial works to enable computational staining of unlabelled samples, which often showed remarkable performance even for 3D subcellular structures 4 , the inherent drawbacks of these conventional modalities (that is, minimal absorption at the cellular level in BF and limited quantitative phase information in DIC 1,11 ) sparked a crucial ongoing quest for better contrast mechanisms for cross-modality inference (Supplementary Table 1).
A promising approach utilizes quantitative optical phase delay as the measured contrast [7][8][9][10] . The emerging quantitative phase imaging (QPI) technologies measure the phase images of unlabelled samples with high sensitivity 11 , which could be paired with FL images for cross-modality inference. Two-dimensional (2D) QPI as a new contrast mechanism has demonstrated improved performance compared to predecessors at the cellular level 9 . However, this class of methods based on 2D phase imaging is fundamentally limited by the coupling of RI and sample thickness, which renders the pixel-wise phase values morphology-dependent. To date, this approach has demonstrated inference of gross structures (for example, nuclei 9 and neurites 10 ), which can often be identified by simple visual inspection, but not non-trivial subcellular structures. Furthermore, the trained inference models were not suitable for generalizing to new samples and required retraining the models from scratch for every single application 9 .
In this study, we report a data-driven technology for label-free multiplexed microtomography of endogenous subcellular structures and dynamics across various spatiotemporal scales. We fundamentally improve the cross-modality inference framework by Label-free multiplexed microtomography of endogenous subcellular dynamics using generalizable deep learning YoungJu  Simultaneous imaging of various facets of intact biological systems across multiple spatiotemporal scales is a long-standing goal in biology and medicine, for which progress is hindered by limits of conventional imaging modalities. Here we propose using the refractive index (RI), an intrinsic quantity governing light-matter interaction, as a means for such measurement. We show that major endogenous subcellular structures, which are conventionally accessed via exogenous fluorescence labelling, are encoded in three-dimensional (3D) RI tomograms. We decode this information in a data-driven manner, with a deep learning-based model that infers multiple 3D fluorescence tomograms from RI measurements of the corresponding subcellular targets, thereby achieving multiplexed microtomography. This approach, called RI2FL for refractive index to fluorescence, inherits the advantages of both high-specificity fluorescence imaging and label-free RI imaging. Importantly, full 3D modelling of absolute and unbiased RI improves generalization, such that the approach is applicable to a broad range of new samples without retraining to facilitate immediate applicability. The performance, reliability and scalability of this technology are extensively characterized, and its various applications within single-cell profiling at unprecedented scales (which can generate new experimentally testable hypotheses) are demonstrated.
introducing the contrast mechanism of the 3D RI, which is an endogenous quantity governing light-matter interaction including both absorption and phase delay. The diffraction-limited 3D RI tomograms, measured by a 3D QPI technique called high-speed multi-angle holography (or holotomography), enables the scalable inference of multiple 3D FL tomograms for the corresponding subcellular targets (termed RI2FL). Importantly, we show that the use of absolute and unbiased RI (decoupled from sample morphology) and its full 3D modelling enable RI2FL to generalize across cell types and thus be readily applicable to a broad range of new samples without retraining. We extensively characterize the performance and scalability of RI2FL and integrate accompanying tools for spatiotemporal uncertainty quantification and single-cell profiling for practical applications. We demonstrate that this broadly applicable pipeline enables high-throughput single-cell profiling of living cells at unprecedented scales to generate previously unseen and experimentally testable hypotheses.

Results
Data-driven discovery of RI-FL relations. We sought to determine the quantitative relations that link RI distribution to subcellular targets in a data-driven manner by training 3D convolutional neural networks to translate a RI tomograms into FL tomograms corresponding to multiple subcellular targets (Fig. 1a). We first created a large-scale dataset consisting of ~1,600 3D RI tomograms (at 532 nm wavelength) and the corresponding 3D FL tomograms from 6 subcellular targets (actin, mitochondria, lipid droplets, plasma membranes, nuclei and nucleoli) and from 6 eukaryotic cell types (NIH3T3, COS-7, HEK 293, HeLa, MDA-MB-231 and astrocytes), all in live cells, using standardized holotomographic microscopes equipped with FL channels 12 (Methods). To train the networks, we used a subset of NIH3T3 tomograms only and held out all other data to test the generalization of the discovered RI-target relations across cell types (Supplementary Table 2).
The distinct nature of RI and FL presents several experimental and computational challenges. First, while RI is an absolute and unbiased quantity independent of the experimenter or instrument, FL signals heavily depend on the labelling quality, the illumination power and the exposure time 1,2 . To address this, we implemented tight quality control procedures carried out by trained cell biologists throughout the data acquisition and processing pipeline, thereby establishing ground-truth subcellular targets defined by 3D FL (Methods). Second, the drastic differences between the FL channels require the painstaking optimization of individual target-specific network architectures. We instead utilized a single, highly flexible network architecture for all subcellular targets, powered by a large-scale neural architecture search, which was also potentially advantageous for extension to additional subcellular targets. (Extended Data Fig. 1a). Third, high-resolution 3D RI tomograms have enormous memory demands that are infeasible for most graphics processing units. To avoid this limitation, we assumed that the target-specific patterns could be identified from local (~10 μm) distributions of RI and we implemented patch-based parallel processing (Extended Data Fig. 1b). With these strategies in hand, we successfully trained the networks for RI2FL inference ( Fig. 1b and Supplementary Video 1).
Performance, generalization and RI-specific advantages. We characterized the performance of RI2FL by quantitatively comparing the inferred and ground-truth FL in the held-out dataset. The prediction accuracy across cell types is illustrated in Fig. 2a. Strikingly, not only NIH3T3 cells but also all other cell types, which were never presented to the networks during training, showed high performance. In particular, excellent accuracy for astrocytes, obtained from primary cultures unlike other immortalized cell lines, strongly supported that RI2FL captured fundamental RI-target relations that were generalizable across cell types. The per-target performance is presented in Fig. 2b. While the high accuracy for nuclei and lipid droplets is consistent with the high RI contrast of these targets 13,14 , all the remaining targets, which are difficult to be recognized via the visual inspection of RI tomograms, showed comparable performances (see Supplementary Table 3 for additional performance metrics). Next we tested the performance of RI2FL to detect salient local structures, which are critical in most practical imaging studies, by detecting and comparing the discrete subcellular structures in the inferred and ground-truth FL data (Extended Data Fig. 2). We confirmed that ~90% of ground-truth structures could be matched to the inferred structures with a mean inter-centroid distance of 0.32 µm, and the intensity values within the matched structures had a strong positive correlation (Pearson's correlation coefficient, r = 0.77). In addition, to rule out labelling-induced bias of the dataset (which is unlikely due to the low density of fluorophores 15 ), we performed RI2FL with unstained samples and then stained the same samples to obtain the corresponding FL ground-truth; this experiment showed consistent results (Extended Data Fig. 3). Furthermore, we characterized the sensitivity of FL inference performance to RI reconstruction error (Extended Data Fig. 4). Together, our results confirmed the seamless identification of endogenous subcellular targets by RI2FL.
Next, we quantified the advantage of RI compared to BF and DIC 3,4 . Because RI represents the complete optical information, incorporating both absorption and phase-delay information, one can reconstruct stacked BF and DIC images from RI tomograms, but not vice versa, using Fourier optics 1,11 (Supplementary Note 1). We trained BF2FL and DIC networks with the simulated stacks and the corresponding FL tomograms and compared the performance of BF2FL, DIC2FL and RI2FL (Fig. 2c,d). There was a considerable margin between RI2FL and the other methods. This is not surprising because the optical information in stacked BF or DIC images is only a subset of the full RI information 11 . Our observation in 3D is consistent with the previous findings in 2D based on direct measurement of BF, conventional-phase and quantitative-phase images 9 .
Assessing reliability via uncertainty quantification. Despite the exciting opportunities provided by RI2FL in addition to previous cross-modality inference approaches, the extent to which we can 'trust' the model predictions in space and time is unclear. We aimed to elucidate this by applying advances in Bayesian deep learning 16,17 , which have been recently introduced in the related fields 18,19 , to cross-modality inference. To this end, we quantified the uncertainty maps to guide the end-users with 'error bars' accompanying the FL predictions. Intuitively, uncertainty can be estimated as the voxel-wise variability of predictions on perturbation of the data or model (Extended Data Fig. 5a,b; note that we used a different aleatoric uncertainty quantification technique 17 , not used in the aforementioned studies 18,19 , for more flexibility in designing the reconstruction loss function). An example that demonstrates uncertainty quantification in RI2FL is presented in Fig. 3a (see also Extended Data Fig. 5c,d and Supplementary Video 2). During animal cell division, the nuclear envelope breaks down to facilitate the separation of aligned chromosomes by the spindle apparatus. This specific event, which is rare in the training dataset, makes nuclei prediction by RI2FL particularly challenging around the periphery of the nuclei. The specific increase in uncertainty at this stage ( Fig  3a, red arrow) casts a cautionary signal for downstream analyses. While this particular example with nuclei was also observed in a different 2D application 19 (thus confirming a biological but not an application-specific origin of the uncertainty), uncertainty quantification can provide end-users with spatiotemporal reliability measures under various circumstances on top of the holistic accuracy metrics. The uncertainty maps can also guide data collection to strengthen the model 20 .
Scalable inference across multiple spatiotemporal scales. RI2FL is intrinsically scalable in space and time because the RI-target relations are shift-and time-invariant. The trained models, together with the patch-based processing (Extended Data Fig. 1b), can be readily applied to the large field-of-view (FOV) RI tomograms obtained by image stitching or high space-bandwidth product techniques 21,22 . We successfully operated RI2FL for tomograms with large FOVs up to 480 × 480 × 13 μm 3 without trading-off the high spatial resolution (Fig. 3b). In addition, RI2FL can be sequentially applied to time-lapse large FOV tomograms, as we demonstrated in recordings up to 72 h ( Fig. 3c and Supplementary Video 3). Importantly, there is no theoretical upper limit in the spatial and temporal scales for holotomography because it is free from photobleaching and phototoxicity (see an example experiment in Fig.  4); the only practical limitations are computing time and memory, which linearly scale with the data dimensions.  Single-cell profiling of intact living cells at scale. An exciting application of RI2FL is time-resolved hybrid single-cell profiling for use in cell biology and high-throughput screening. The image-based profiling of single cells with standardized data acquisition and interpretable feature extraction has provided insights into new phenotypes and cellular heterogeneity, thereby complementing genomics (for example, Cell Painting with CellProfiler 23 ). While this capability is critically dependent on highly multiplexed FL imaging, the spectral overlap issue limits such measurement to fixed cells via multi-round imaging. Meanwhile, multiplexed microtomography with RI2FL in intact living cells enables the time-resolved profiling of single-cell phenotypes. Furthermore, RI contributes additional information orthogonal to FL. Traditionally, RI has been a uniquely suitable modality for ultrasensitive quantification of subcellular mass 24 , which is particularly relevant for the study of cell cycle and growth 25,26 . This quantitative nature of RI can be synergistically combined with the specificity of FL to access new dimensions for single-cell profiling ( Fig. 5a and Methods). In contrast to a previous approach 6 (based on 2D reflection-based cross-modality inference), our pipeline provides access to full 3D information for both FL  Table 4). We defined a minimally redundant set of 65 features that spanned the representative facets of the single cells and quantitatively validated the RI2FL-inferred feature values using the ground-truth FL (r = 0.97 across all features and FL channels; Extended Data Fig. 6), while one can readily define thousands of single-cell features with the current level of multiplexing 23 . The unsupervised low-dimensional embedding of the 65 single-cell features revealed an intriguing variability across and within the cell types 27 (Fig. 5b). Three exemplary features underlying this variability are shown in Fig. 5c. Astrocytes generally had large cell volumes, which is consistent with their complex stellar morphology. NIH3T3 cells, which are fibroblasts with characteristic actin structures, showed a high actin density. HEK 293 cells had a high mass density, which can be attributed to the high rates of protein production by this cell type. Notably, for these three and all the other features, an enormous variability was observed within each cell type, which is partially dependent on the cell cycle, making single-cell profiling attractive.
Time-resolved high-dimensional profiling of single cells. Next, we proceeded to time-resolved interrogations. For proof of concept, we carried out a series of perturbation experiments. First, NIH3T3 fibroblasts were stimulated with platelet-derived growth factor (PDGF) to promote cell growth in a physiological manner 28 while observing the cells with a high volume rate (0.8 s per volume). This was made possible by the high speed of holotomography without photobleaching and phototoxicity. The inferred FL channels clearly visualized lamellipodia formation (Fig. 5d, white arrows) and actin reorganization in 3D at a time scale of minutes ( Fig. 5d and Supplementary Video 4). The time-resolved profiling was able to measure the fast dynamics of the features in response to PDGF stimulation (Fig. 5e). This measurement represents an unprecedented regime in cell biology that has been previously inaccessible due to technical limitations such as rapid photobleaching over time (Fig. 4). The temporal resolution can be readily improved beyond the video rate 29 .
To determine the roles of distinct signalling pathways, we specifically targeted RhoA, a Rho family small GTPase downstream of PDGF, using chemogenetics 30 . The rapamycin-induced formation of the FKBP-RB-rapamycin complex recruited constitutively active RhoA to the plasma membrane ( Fig. 5f and Supplementary Video 5). Unlike PDGF, RhoA stimulation specifically promoted the formation of actin stress fibres, which resulted in characteristic cell morphology. Consistently, only a subset of the features illustrated in Fig. 3e showed similar dynamics to PDGF stimulation (Fig. 5g).
Application to a new dataset without model retraining. This single-cell profiling pipeline, powered by the unprecedented generalization of RI2FL, could be readily applicable to various RI tomogram datasets from new cell types, conditions and microscopes. Here, we reanalysed a dataset previously collected using a different microscope 14 . In particular, RI tomograms of macrophage-derived, lipid-droplet-rich foam cells were measured after treatment with a panel of targeted nanodrugs for high-throughput atherosclerosis drug screening 14 (Fig. 6). The application of RI2FL to this dataset retrieved cellular features consistent with the previous study 14 (for example, a drug-induced decrease in the amount of the intracellular lipid droplets; feature 1). Furthermore, the results also suggested an unexpected condition-dependent intranuclear mass redistribution (feature 2) and actin remodelling (feature 3), which could guide more specific follow-up experiments. Taken together, our results demonstrate that RI2FL provides a powerful means by which to quantitatively profile single living cells via simultaneous access to a variety of information to rapidly generate previously unseen and experimentally testable hypotheses.
Exploring the limits of RI2FL. Finally, we attempted to explore the limitations of the technique by comparing inferred structures with related untrained structures. Specifically, we hypothesized that endosomes might be incorrectly predicted as lipid droplets due to their similar morphologies, and thus expected a high overlap between FL-labelled endosomes and inferred lipid droplets. However, the experiments revealed that the intracellular distribution of the inferred lipid droplets was starkly different from that of the ground-truth endosomes (Extended Data Fig. 7; also note the distinct RI colocalization levels). In fact, lipid droplets and endosomes seemed to be almost mutually exclusive in space, which suggests that there is a potential regulatory mechanism that mediates lipophagy 31  this route for future investigations using the present technology (see Discussion for an alternative approach). Again, spatiotemporal uncertainty quantification could be a useful tool to guide this path.

Discussion
In summary, we developed and extensively characterized RI2FL, which is a scalable framework to infer endogenous subcellular structures and dynamics from 3D RI tomograms. The high performance and improved generalization of this approach was a result of the full 3D modelling of RI, which encompasses both absorption and phase-delay information and is fully decoupled from the morphology of the sample. Together with the uncertainty quantification schemes used to measure the prediction reliability in space and time, RI2FL represents a powerful platform technology for cell biology and high-throughput screening, as demonstrated by its capacity for unprecedented single-cell profiling and data-driven hypothesis generation (see Supplementary Table 1 for a detailed comparison of the related cross-modality inference methods).
The high-dimensional observation and perturbation of single-cell dynamics at scale can facilitate a systems-level understanding of cellular behaviour and decision-making. So far, most studies of systems biology have relied on snapshot (for example, transcriptomics or multi-round imaging) or low-dimensional time-series (for example, one or two fluorescence biosensors) measurements of cellular states, which leads to critical limitations in the ability to infer the network-based logic governing cellular dynamics. With high-dimensional time-series measurements (or single-cell state space trajectories) in hand, one can directly infer the underlying dynamical systems at the single-cell level, analogously to systems neuroscience 33 . We are currently applying this approach to search for the dynamical phenotypes specific to clinically relevant cellular malfunctions (for example, cancer) or high-efficacy drug actions 34 .
We believe that this study, at least partly, addresses a long-sought goal of the label-free imaging community: biochemical specificity by RI. While the overlapping RI values of many subcellular structures have precluded RI-based imaging beyond nuclei and lipid droplets, we have previously proposed that the spatial distribution of RI may encode enough information to infer these structures 35 , and we have experimentally demonstrated this in the present study. We look forward to extending this technology to tissues and ultimately to in vivo applications, synergizing with new approaches to 3D QPI in highly scattering systems 36,37 . An important next step would be to reverse-engineer the trained models to interpret the discovered RI-target relations 6,38-40 (Extended Data Fig. 8). At the moment, the feasibility of RI2FL for a new target can be tested only empirically by target-by-target training and characterization (as we tried with lipid droplets and endosomes). The interpretability may reveal general principles governing light-matter interactions in biological systems and clarify the fundamental limits of RI2FL as well as other cross-modality approaches.

online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/ s41556-021-00802-x.
The fluorescence labelling strategies were as follows. Actin was stained by phalloidin (A12379, Invitrogen) or genetically labelled by expressing mCherry-Lifeact (constructed by inserting the F-actin peptide-encoding sequence into the mCherry-C1 vector). Mitochondria were stained by MitoTracker Red CMXRos (M7512, Invitrogen). Lipid droplets were stained by LipiDye (FDV-0010, Funakoshi). Plasma membranes were stained by CellMask (C10046, Invitrogen) or genetically labelled by expressing GFP-MEM (based on KRAS4b plasma membrane localization signal peptide). Nuclei were stained by Hoechst 33342 (H3570, Life Technologies). Nucleoli were genetically labelled by expressing fibrillarin-mCherry (BC019260.1). Endosomes were genetically labelled by expressing GFP-ENDO (based on RhoB GTPase that predominantly localizes in the early and recycling endosome). For the chemogenetic stimulation experiment, Lyn-FRB and YFP-FKBP-RhoA(CA) were co-expressed in NIH3T3 cells. All genetic labelling processes were driven by CMV promoters. The cells were transfected via electroporation (Neon Transfection System, Invitrogen) using a voltage of 1,280 V, a pulse width of 20 ms and 2 pulses.
Holotomography-optimized cell culture imaging dishes (TomoDish, Tomocube) were seeded with approximately 450,000 cells per dish. The culture dishes were coated with 0.01% poly-d-lysine for 15 min, washed three times with distilled water and fully dried before use.
Imaging and perturbation. Holotomography was conducted using standardized microscopes (HT-2H, Tomocube) implementing optical diffraction tomography (ODT) to solve the RI-thickness coupling problem in phase images. The principles and implementations of ODT have been extensively reviewed elsewhere 11 . The specific implementation used here was based on holographic field retrieval with multi-angle Mach-Zehnder interferometry using coherent 532-nm laser light steered by a digital micromirror device. The 3D RI tomograms were reconstructed by mapping the measured field information to the 3D Fourier space and filling the missing cone using 40 iterations of the non-negativity-constrained missing cone recovery algorithm 41 . The acquisition time for a single volume was less than 1 s. In addition, we optionally utilized the three FL channels to measure the ground-truth 3D FL tomograms using stacked wide-field acquisition with a ~0.3-μm step size followed by 3D deconvolution 12 (excitation centre wavelengths of 385 nm, 470 nm and 565 nm). Excitation light intensity and exposure time were manually adjusted by trained cell biologists to clearly visualize the target structures at consistent intensity levels, and all RI-FL paired tomograms were visually inspected by at least two researchers. We note that future generations of standardized QPI microscopes equipped with diffraction-limited fluorescence channels, such as confocal, light-sheet or structured illumination microscopy, will enable RI2FL to accomplish its full potential because the resolution of the inferred FL tomograms is apparently limited by that of the ground-truth FL tomograms in all cross-modality inference methods. For the perturbation experiments, the concentrations of PDGF (PDGF-BB, PeproTech) and rapamycin (Calbiocam) in the imaging medium were 10 nM and 0.5 μM, respectively. Detailed experimental protocols on holotomographic imaging are available elsewhere 15,42 .
Data processing. All tomograms were resized to have a voxel size of 0.15 × 0.15 × 0.2 μm 3 before inference. The default FOV used in training and evaluation was 512 × 512 × 64 voxels, which corresponds to a volume of 76.8 × 76.8 × 12.8 μm 3 . This volume was then further subdivided for patched-based processing. The large FOV RI tomograms were obtained by offline 3D stitching after the rapid acquisition of slightly overlapping tiles of FOVs. Specifically, we estimated the tile-to-tile displacement using the phase correlation of the single FOV RI tomograms, and the overlapping volumes were processed by image blending. The RI values were clipped into a range between 1.337 and 1.390. While we targeted an identical FOV for the RI and FL tomograms by sharing the most optical path, we noted a small axial discrepancy due to the intrinsic differences between the modalities (for example, aberration and latency). To secure voxel-wise correspondence to facilitate supervised learning and evaluation, we estimated and corrected this discrepancy using the axial cross-correlation of the RI and FL tomograms. All the tomograms were manually inspected using ImageJ (National Institutes of Health) after registration to be included in the dataset. It is worth noting that this procedure is not necessary after training and evaluation. For simulating BF and DIC stacks, we used custom scripts written in Matlab (MathWorks). Detailed procedures are described in Supplementary Note 1.
Model design, training and inference. We used a single network architecture for all subcellular targets to avoid optimizing individual target-specific architectures. We automatically designed a highly flexible architecture through a scalable neural architecture search (SCNAS) 43 . Specifically, SCNAS utilizes a stochastic sampling algorithm in a gradient-based bi-level optimization framework to jointly search for the optimal network parameters at multiple levels with generic 3D medical imaging datasets. As a result, a U-Net-like encoder-decoder structure with skip connections was discovered (Extended Data Fig. 1a). At the end of every microlevel architecture, known as a motif, we added a dropout operation. The network parameters were as follows: activation function, leaky ReLU; normalization function, instance normalization; size of initial feature map, 12; number of layers, 8; feature map multiplier, 3.
To promote precise inference of both the large-and small-scale structures in the FL tomograms, for network training, we used a loss function, l, with both mean squared error (MSE), l MSE , and gradient difference loss (GDL), l GDL , terms: l = l MSE + l GDL . Each term is defined as follows: where y and ŷ are the ground-truth and inferred FL channel, respectively, h (·) is the 3D Sobel operator and E (·) is the expectation over voxels and operations.
To train the networks, we used an Adam optimizer with an initial learning rate of 0.001, whereby the learning rate was reduced by a factor of 5 if there was no improvement in the validation metrics for 30 epochs. Randomly sampled parameters were used for data augmentation techniques such as flip, rotation, cropping, elastic deformation and gamma correction. Hyperparameter optimization was based on a grid search algorithm whose search space consisted of hyperparameter combinations with similar memory and FLOPS requirements 44 . While we trained a single final network per target due to the strong generalization of the technique, in principle, one could implement transfer learning for fine-tuning the network parameters for weak-generalization datasets. We used PyTorch in Python 3 to implement the deep-learning pipeline.
Due to the memory constraints of graphics processing unit computing, we trained the networks using 3D patches instead of entire tomograms. During training, the patches were randomly cropped from regions with registered FL data. For post-training inference, a RI tomogram was symmetrically padded, divided into overlapping patches with regular spacing, individually processed by the networks and then stitched into whole FL tomograms with spline kernel-based blending (Extended Data Fig. 1b). The default size of a patch was 256 × 256 × 64 voxels.
Performance and uncertainty quantification. Three performance metrics were used: peak signal-to-noise ratio (PSNR), Pearson's correlation coefficient (PCC) and structural similarity index (SSIM). Each metric is defined as follows: PSNR (y,ŷ) = 10 × log 10 1 l MSE PCC (y,ŷ) = σ yŷ σyσŷ where μ and σ are the mean and standard deviation, respectively, and covariance c 1 = 0.01 2 , and c 2 = 0.03 2 by default. Following convention, we used a non-zero minimum standard deviation for PCC and a 3D Gaussian kernel with a size of 7 voxels for SSIM. The metrics were complementary to each other. PSNR is relevant for the MSE, but suffers from poor perceptual performance and noise vulnerability. PCC has an adequate perceptual performance, but minimally captures local differences. SSIM, which is relatively complicated, is a comprehensive metric designed to overcome the shortcomings of PSNR or PCC. SSIM can be factored into three terms called luminance, contrast and structure. The luminance term is similar to MSE but uses mean values instead of voxel values. The contrast term quantifies the similarity of high-frequency components relevant to the GDL. The structure term is nearly identical to PCC. Various quantifications of the three metrics are shown in Supplementary Table 3. For segmentation of lipid droplets, we first transformed the maximum intensity projection (MIP) of the inferred lipid droplets channel to a probability map using a random forest pixel classifier provided by Ilastik 45 . Then we segmented the lipid droplets with a probability threshold of 0.9 and size-filtered segments with a size threshold of 0.5 μm 2 . We note that the RI2FL framework can be repurposed for improved segmentation of QPI data 13,46 , although this application is beyond the scope of the present study.
Following recent Bayesian deep-learning approaches for computer vision, two types of uncertainty were considered: data (aleatoric) and model (epistemic) uncertainty. While the precise origins and mathematical derivations have been extensively reviewed elsewhere 20 , here, we describe the uncertainty quantification schemes that are well-suited for RI2FL. Data uncertainty was quantified by test-time augmentation 17 with image transforms such as flip and rotation, which was compatible with the aforementioned loss function. Model uncertainty was quantified using a Monte Carlo dropout 16 . In both cases, we quantified the mean and standard deviation in the FL output space after the perturbation of either the data or model. The two calculated standard deviation maps defined the data and model uncertainty. The average of the two mean prediction maps defined the final inferred FL, which also slightly increased the performance. We did not apply these schemes to the stitching or time-series (except for the cell-division example) data due to high computational costs.
Single-cell profiling. After inference with RI2FL, a variety of open-source computational tools developed for FL data could be readily utilized. To segment the single cells and nuclei in the tomograms, we first trained a random forest voxel classifier provided by Ilastik 45 based on the inferred nuclei and plasma membrane channels. The voxels were sparsely annotated as background, cytoplasm or nucleus for a handful of tomograms, and the trained classifier generated the voxel-wise class probability maps for the entire dataset. Then, the single nuclei could be readily segmented by thresholding the nuclei probability. The tentative cells, obtained by thresholding the summation of the cytoplasm and nuclei probability, were segmented by marker-controlled watershed segmentation, provided by CellProfiler 23 , using the identified nuclei as segmentation markers. The segmentation performance was robust due to the high specificity of the inferred FL channels.
We extracted a variety of single-cell features from the segmented single cell/ nuclei volume masks, the additional inferred FL channels (actin, mitochondria, lipid droplets and nucleoli) and the measured RI channel aligned in a common coordinate system. The calculation of the mass-related features was based on the well-characterized linear dependence of RI, n(x,y,z), to the dry mass density, C(x,y,z), for biological samples 11,24 : n (x, y, z) = nm + αC (x, y, z) , where n m and α are the RI of the imaging medium (n m = 1.337 at λ = 532 nm) and the RI increment (α = 0.190 ml g -1 at λ = 532 nm), respectively. The mass density and actin density described in the main text indicate the cellular dry mass density and cytoplasmic actin mean, respectively, in Supplementary Table 4. For the time-lapse experiments, we used the frame-wise application of the segmentation and feature extraction procedures to quantify the feature dynamics. We used Matlab (MathWorks) to implement the feature extraction script.
Uniform manifold approximation and projection (UMAP), implemented in Python 3, was used for the unsupervised nonlinear embedding of the features for 2D visualization 27 . All features were z-scored before UMAP, and the hyperparameters were a minimum distance of 0.5 with five neighbours.
Statistics and reproducibility. All statistical tests were performed using the SciPy package in Python. Unpaired two-sided Student's t-test was used to calculate the P values in Figs. 2c and 6c. P < 0.05 was considered statistically significant. One-way analysis of variance (ANOVA) was used to calculate the F values in Fig. 6b. The experiments in Figs. 1b, 3a, 3b, 3c and 4 and Extended Data Fig. 7 were performed 856, 10, 96, 96, 8 and 6 times, respectively, with similar results obtained. The number of samples are specified in the respective figure legends and/ or Supplementary Table 2. Each error bar, unless otherwise indicated, represents standard deviation (s.d.).
Reporting Summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
RI2FL datasets are available at https://github.com/NySunShine/ri2fl. Source data are provided with this paper. All other data supporting the findings of this study are available from the corresponding author on reasonable request. Fig. 7 | intracellular distribution of inferred lipid droplets and measured endosomes. Unexpectedly, the intracellular distribution of the inferred lipid droplets was starkly different from that of the ground-truth endosomes (see main text). While the lipid droplets were strongly correlated with high RI, this was not the case for endosomes.