Abstract
Observing subcellular structures labeled by distinguishable fluorescent probes in cells with the microscope is one of the key technologies commonly used in cell biology research. However, due to the spectral overlap, traditional methods of multi-channel sequential imaging of different-colored structures are difficult to overcome the problems of a limited number of labels in a single cell and imaging delay. Here we propose a double-structure network (DBSN) via multiple networks, which can extract six subcellular structures from three images with only two kinds of label markers. DBSN combines the intensity-balance models to even up the diverse densities of fluorescent labels for different structures and the structure-separation models to extract multiple different structures from a single image. The experimental results show that DBSN breaks the bottleneck of the existing technologies on the research of dynamic interaction of organelles and provide a new possibility in drawing the interaction network of organelles.
Introduction
For cell biology, studying the interaction between subcellular organelles has become one of the key research directions. Equipped with fluorescent-probe-based techniques and microscopy imaging, researchers can observe their distribution and translocation in living cells. Nevertheless, the above technologies have several disadvantages.
With the development of microscopy imaging, especially super-resolution fluorescence microscopy technology, it is convenient to discover more detailed observations of organelles. However, most of existing super-resolution imaging technologies need multiple original data to achieve higher spatial resolution result in the cost of sacrificing a certain time resolution. Structured illumination microscopy (SIM) [1], a representative super-resolution fluorescence microscopy technology that relies on delicate hardware devices to bypass the effect of point spread function (PSF), requires 9/15 raw data. Based on strict mathematical and physical derivation, complex algorithms are designed to improve image resolution and achieve super-resolution reconstruction [2], which requires hundreds to thousands of original data. The technologies with the highest spatial resolution, such as stochastic optical reconstruction microscopy (STORM) [3], photoactivated localization microscopy (PLAM) [4], and DNA points accumulation for imaging in nanoscale topography (DNA-PAINT) [5], which accomplish super-resolution reconstruction based on single molecule localization algorithm requiring tens of thousands of raw data for a high spatial resolution image. Therefore, the time resolution of those super-resolution reconstruction methods is restricted by the excessive demand for raw data. In recent years, deep learning has been used in lots of studies to reduce the demand for raw data [6–9] with comparable spatial resolution. But in the end, a certain amount of raw data is out of necessity to reconstruct a super-resolution image.
The canonical method used in multi-structure imaging research is to apply fluorescent probes with different excitation or emission light to carry out specific labeling of different subcellular structures and then combine with multi-channel sequential imaging of different-colored structures. Considering the overlapping of fluorescence spectra, there are four kinds of lasers commonly used in biological imaging research at most, 405 nm, 488 nm, 561 nm, and 640 nm. In addition, it is impossible to ignore the switching time of hardware devices in the study of synchronous imaging of multiple subcellular structures by switching the filter or laser light source. Therefore, this technology is only applicable to the study of the targets in which the spatial position between the organelles is almost constant during the period of multi-channel sequential imaging. If utilizing the above super-resolution imaging technology that requires multiple original data to explore the interaction of organelles, the consequence caused by the poor time resolution will become much more severe.
Researchers were devoted to increasing the number of labels in a single cell. Jennifer Lippincott-Schwartz adopted a multispectral image acquisition method that overcomes the challenge of spectral overlap in the fluorescent protein palette. They extended the number of different labels that can be distinguished in a single image into six [10]. Peng Xi reported a spectrum and polarization optical tomography (SPOT) technique. Using Nile Red to stain the lipid membranes, SPOT can simultaneously resolve the membrane morphology, polarity, and phase from the three optical dimensions of intensity, spectrum, and polarization, respectively. These optical properties reveal lipid heterogeneities of ten subcellular compartments [11]. To overcome the limited quantity of labels in a single cell, both of them combined additional information such as spectrum or polarization in addition to image data. This will increase the complexity of the hardware system. Moreover, the SPOT technology is only suitable for targets with membrane structures. On the side, considering that fluorescent labeling may affect the function of cells, unlabeled imaging technology is also a hot topic of study. Several computational machine-learning approaches were proposed, which can predict some fluorescent labels from unlabeled biological samples [12–14]. However, these technologies require special imaging systems and lack universality. The complex background of bright field images will also reduce the accuracy of target structure extraction.
In general, the technologies mentioned above have inevitable problems such as poor temporal resolution, fluorescence crosstalk, and a limited number of probes. Here we propose a double-structure network (DBSN) via multiple networks, which can extract six subcellular structures from three images. The mechanism of the DBSN is mainly divided into three steps. For each cell sample, three microscopic images are collected with three imaging channels first. Bright filed image is used for segmenting the profiles of the nucleus and cell membrane. The fluorescence image of clathrin-coated pits (CCPs) and microtubules (MTs) is collected with a 488 nm laser and the fluorescence image of endoplasmic reticulum (ER) and adhesions is collected with a 561 nm laser. Then both the fluorescence images are pre-processed by two intensity-balance models to even up the different gray information between two structures in a single image. Finally, the bright field image and the above two intensity-balanced images are fed into three different structure-separation models to extract the profile of the nucleus, cell membrane, CCPs, microtubules, ER, and adhesion (Fig. 1 a)). In all, the proposed DBSN consists of five different models and all based on U-Net [15, 16]. The test results in the paper show the value of the DBSN in improving the microscopy imaging speed and its significant applications in the dynamic interaction of organelles in cells.
a) A bright field image is collected for nucleus and cell membrane segmentation. An image of CCPs and microtubule structures is collected with a 488 nm laser and an image of ER and adhesions structures is collected with a 561 nm laser. For each channel, an intensity-balance model is used to even the intensity of two structures in one fluorescence image. Then those two images with the bright field image are fed into three specific structure-separation models to extract six different subcellular structures. b-c) Fluorescence images of each subcellular structure (CCPs_ori, MTs_ori, ER_ori and Adhesion_ori) are collected independently. The maximum gray value of the two combined images (CCPs_ori & MTs_ori or ER_ori & Adhesion_ori) V will be used to pull the two images to the same intensity scale and generate new images (CCPs & MTs or ER & Adhesion). Two randomly generated weight coefficients between 0.1 and 1 will be used to generate an image with an intensity difference between the two structures (Merge_1). The image named Merge_2 is directly summed by CCPs and MTs or ER and Adhesion. The above two intensity-balanced images (CCPs & MTs or ER & Adhesion) will be used as the ground truth for the structure-separation network while the sum image of them (Merge_2) will be used as the ground truth for the intensity-balance network and the input of the structure-separation network. The image with intensity difference between two structures (Merge_1) will be used as the input for the intensity-balance network. d) A Hoechst 34580labeled nucleus image, a Paxillin-GFP labeled adhesion image, and a bright field data are collected respectively. The bright field data will be used as the input data of the structure-separation network. e) The profiles of the nucleus and cell membrane are manually segmented from the Hoechst 34580 image and the Paxillin image, which will be used as the ground truth of the structure-separation network. Scale bar b)-c) 5 μm; d)-e) 10 μm.
Results
Training datasets for DBSN
In the study of observing multiple various subcellular structures in the same cell at the same time, it is crucial to tag these structures in different fluorescence. However, it is always challenging to ensure that the density of fluorescent markers in different structures is at the same level (Suppl. Fig. 1). This directly leads to brightness discrepancies when displaying numerous subcellular structures in the same projective picture. Traditionally, in the study of multicolored structures, a result map with relatively balanced gray information among structures can be superimposed by adjusting the contrast in different channels. However, this approach will bring about the time-consuming procedure of multi-channel sampling. Therefore, we want to mark multiple subcellular structures in the same cell in the same color to achieve the purpose of synchronous imaging. In this case, the problem of intensity balance is particularly important in multi-structure images.
All training datasets used in this study are synthetic. The fluorescence data of each subcellular structure was collected independently. Taking the two-dimensional spatial distribution of these four subcellular structures into consideration, we divided CCPs and microtubules into one group, and ER and adhesion into another group (Fig. 1 b-c)). The detail to prepare training datasets can be divided into three steps. For each sample, first, we searched the maximum gray value of two fluorescent images in the same group. Then, this maximum gray value was used to pull the two images to the same intensity scale. The above two intensity-balanced subcellular structure images were the ground truth for the structure-separation network while the sum image of those two images was the ground truth for the intensity-balance network and the input of the structure-separation network. Finally, randomly generated two weight coefficients between 0.1 and 1, and multiplied them with the above two intensity balanced images. The summed image with random intensity difference was the input of the intensity balance network.
To further reduce the complexity of fluorescent labeling, the bright field image was used to segment the nucleus and cell membrane. Fig. 1 d) shows the way to prepare the training dataset for the segmentation of nucleus and membrane profiles. For the same cell, HOECHST 34580 labeled nucleus, Paxillin-GFP labeled adhesion image, and bright field data were collected respectively. The bright field data was the input data of the structure-separation network. The Hoechst 34580 image and Paxillin image were manually segmented as the ground truth (Fig. 1 e)).
Intensity-balance network for multi-structure in a single image
The architecture of the intensity-balance network is described in Fig. 2 a). The fluorescent dual-structure image with intensity difference is the input of the network. The intensity-balanced image is the ground truth of the network. The local enlargement views shown in the figure represent the structure information vanished caused by the intensity difference, which can be reconstructed by the intensity-balance network. Fig. 2 b-c) illustrates the recover results using the intensity-balanced network. For each panel, the first column is the input data with different intensity ratios of the dual structures. The intensity difference ratio is signed at the bottom left of each input picture.
a) The architecture of the intensity-balance network. The fluorescent double structures image with intensity difference is the input of the network. The intensity-balanced image is the ground truth of the network. b-c) Intensity-balanced results. For each panel, the first column is the input data with different intensity ratios of the dual structures. The intensity difference ratio is marked at the bottom left of each input picture. The second column is the ground truth. The last column is the output of the network. The gray intensity distribution of the white line in each picture is the yellow dashed line in the corresponding image. d-e) Quantitative evaluation results of intensity-balance models. Scale bar a) 5 μm, 1 μm. b-c) 3μm.
When the maximum intensity value of the CCPs image is only one-tenth of the maximum intensity value of the microtubule image, intuitively, we can hardly see the shadow of CCPs in the input image. A similar situation can also be observed in the combination diagram of ER and adhesion. When the maximum intensity value of ER image is only one-fifth of the maximum intensity value of the adhesions image, we can barely distinguish the shape of ER from the input image. On the contrary, when the intensity of CCPs or ER is several times higher than the other structure, the information of the other one almost disappears in the input image. All the reference images represent the manually intensity-balanced images’ summary, while the last column in the figure is the output of the network. The gray intensity distribution of the white line in each picture is the yellow dashed line in the corresponding image. Compared with the input image, the shape of the gray value fluctuation curve in the DBSN image is more similar to the ground truth images and contains more information.
Finally, we statistically evaluated the performance of intensity-balance models in peak signal-to-noise ratio (PSNR), normalized root-mean-square error (NRMSE), and structural similarity (SSIM) using manually gained intensity-balanced images as references (Fig. 2 d-e)). The PSNR, NRMSE, and SSIM performance confirm that DBSN can significantly even the intensity information up in the input. Those three metrics were also run for the evaluation of the performance of the intensity-balance network training procedure (Suppl. Fig. 2). We calculated those three metrics for every 50 epochs. Based on it, we selected the optimized model.
Structure-separation network for multi-structure in a single image
After pre-processing such as intensity equalization, multiple subcellular structure images were superimposed into a projection image as the input of the structure-separation network, and every single structure microscopic image was used as the sub-channel ground truth. The value of PSNR, NRMSE, and SSIM were also calculated to evaluate the effectiveness of the structure-separation network training procedure (Suppl. Fig. 3). We calculated those three metrics for every 10 epochs. Based on it, we selected the optimized structure-separation model.
The efficacy of the structure-separation network is presented in Fig. 3. In Fig. 3 a)–3 d), from left to right are the input image, the ground truth of two structures, and network separation results. Fig. 3 b) and Fig. 3 d) are local-enlarged regions enclosed by the yellow box in Fig. 3 a) and Fig. 3 b) with intensity profiles along the yellow dashed lines. The input image is a dual structure overlay image after manually intensity balanced. As shown in Fig. 3 b), both CCPs and microtubules are observed in the input image. The two single-structure ground truth drawings can also further clarify that there is a bright CCPs point superimposed on the end of the microtubule indicated by the yellow dotted line. DBSN distinguishes these two targets perfectly. In the output results from the DBSN, only CCPs are retained in the channel of CCPs, and only the microtubule structure in the channel diagram corresponding to microtubule structure. The fluctuation of the gray value of the corresponds positions in the two diagrams is very similar to the reference images. As shown in the local enlargement images Fig. 3 d), both ER and adhesion are merged in the input image, which is difficult to distinguish these two structures. Compared with the ground truth images, the output of DBSN provides comparative reliability, which can be verified by the quantitative evaluation results in Fig. 3 e).
a) Structure separation of CCPs and microtubules overlay image. From left to right are input image, ground truth of two structures and network separation results. b) Enlarged regions enclosed by the yellow box in (a) with intensity profiles along the yellow dashed lines. c) Structure separation of ER and adhesions overlay image. From left to right are input image, ground truth of two structures and network separation results. d) Enlarged regions enclosed by the yellow box in (c) with intensity profiles along the yellow dashed lines. e) Quantitative evaluation results of structure separation models. f) Nucleus and cell membrane segmentation results. Left is the input bright field image. Right is the segmentation result. The red curve is the manually labeled reference result and the green curve is the output of the network. g-h) DBSN separates structures of the dual-structure labeled TIRFM images. g) The COS-7 cells were transfected with CCP-GFP and EMTB-GFP plasmids to label the CCPs and microtubules, respectively. h) The COS-7 cells were transfected with mCherry-KDEL and paxillin-RFP plasmids to label the ER and adhesion, respectively Scale bar a), c) 3μm; b), d) 1 μm, f-h) 10 μm.
In Fig. 3 f), the left one is the input bright field image, and the right one illustrates the manually labeled reference result and the output of the network. However, what needs to be clarified is that to train the nucleus and cell membrane segmentation model, preparing images containing the whole cell is necessary. To further verify the performance of the model, we used the data collected by the self-built TIRFM imaging system as the application verification of the model (Fig. 3 g-h)). When two kinds of subcellular structures in the same cell were synchronously imaged with the same color marker, DBSN can effectively separate different structures into different channels.
Discussion
This study focuses on the realization of rapid capture of the dynamic process of the interaction of multiple subcellular structures in living cells. We employ deep learning technology to establish a reliable multi-structure monochromatic marker labeling and simultaneous imaging method. Using the DBSN put forward in this paper, six kinds of subcellular structures can be extracted from three microscopic images of the same cell. The network addresses the problem that the different density of fluorescent markers between subcellular structures leads to large intensity differences in the projective image by operating the intensity-balance model. The structure-separation model breaks through the restriction of laser band that only four types of subcellular structures can be labeled diacritically in a single cell at most. Based on the DBSN, multiple subcellular structures can be labeled with the same color and then parallelly imaged. It truly achieves the synchronous recording of the interaction phenomenon between multiple subcellular structures and helps to draw the interaction network of organelles. The method created in this paper also provides a possibility for the study of organelle interaction using the technology that requires multiple original data to reconstruct a super-resolution map in the future. According to the research results, it’s possible to separate more than two kinds of structures tagged with same fluorescent color at the same time while relying on deep learning.
Moreover, finding a way to handle the problem of various focusing states between subcellular structures on the two-dimensional projection (Suppl. Fig. 1) will further improve the performance of DBSN. A cycle generative adversarial network-based model and a multi-component weighted loss function were used to solve the out-of-focus issue in microscopy [17]. If the DBSN network can integrate the defocus correction model, it will redouble the reliability of multi-structure synchronous imaging. Using the defocus correction model or intensity balance model to preprocess the data before carrying out the structure-separation model will be more helpful to draw the interaction network of organelles.
This project was based on the public platform named the BioSR dataset [18]. However, there is still a lack of open-source public datasets for microscopy image analysis. Even a slight difference in imaging procedure will degrade the performance of trained models. Using synthetic datasets or public datasets in the pretraining step and then combining transfer learning on the existing data is an optimized solution to reach satisfactory results. As deep learning technologies developed, an intelligent augmented microscope [19] and even-driven microscopes [20–21] also aroused researchers’ interest. However, all of those technologies rely on a huge quantity of annotated datasets with high quality. It is very important to establish a public platform for microscopy images with credibility.
Materials and methods
Microscopes
The multi-angle ring-illumination TIRFM system was established based on a classical objective-type Olympus IX83 TIRFM microscope, which is equipped with 405 nm, 488 nm, and 561 nm laser lines. The objective lens is 100X (1.45 NA) and the EMCCD camera is Andor iXon3. For Hoechst 34580-labeled nucleus imaging, a 405-nm laser (5mW) was used as excitation light and Paxillin-GFP labeled imaging, a 561-nm laser (100mW) was used for stimulation. Those widefield images were acquired under the control of Cellsense software with an exposure time of 100 ms. Bright-field images were collected of the same cell.
Cell culture and datasets preparation
For the training and testing step, original SIM raw images of CCPs, microtubules, and ER were downloaded from the BioSR dataset [18]. Original adhesion SIM raw images were from our previous work [7]. The average images of all the above SIM raw images were used in the DBSN.
COS-7 cells were cultured in high glucose Dulbecco’s modified Eagle’s medium (DMEM) (Cytiva, SH30243.01B), supplemented with 10% fetal bovine serum (HyClone, SV30087) and 1% penicillin–streptomycin (Beyotime, C0222) at 37°C in a humidified 5% CO2 incubator. Cells were grown on a 35-mm glass bottom dish (Cellvis, D35-20-1-N) for fluorescence staining experiments.
To visualize and segment the cell membrane, COS-7 cells were transfected with paxillin-GFP. Nucleus was stained with Hoechst 34580(Invitrogen™, H21486). For each cell, a nucleus image, an adhesion image, and a bright field image were collected. Those three images were used to train the nucleus and cell membrane segmentation model. For the application step, COS-7 cells were transfected with CCPs-GFP, EMTB-GFP, paxillin-RFP, and mCherry-KDEL.
Image preprocessing
For training the intensity-balance models and fluorescence structure-separation models, we cropped the original image stacks into smaller patches to generate more training samples. Images of CCPs, microtubules, adhesion, and ER images were cropped into 256 × 256 pixels. To prepare the dataset for the nucleus and cell membrane segmentation model, the whole cell should be contained in the image. Therefore, instead of cropping patches, we resized the original image into 256 × 256 pixels. After manually segmenting the profiles of the nucleus and cell membrane, image flipping, rotating, and mirroring were used to augment the dataset. In total, we obtained 600–800 samples for each experiment, which were then randomly divided into training and testing subsets. Detailed information about each dataset is in Supplementary Table 1.
Training details
We adopted the Python package U-Net as previously reported [7]. The training and inference were performed on a computation platform with Intel Core i9-10900KF CPU and graphic processing cards (NVIDIA) GeForce RTX 3080 GPU. During the training process, we used the Adam optimizer. We initialized the networks randomly and trained the models with a typical starting learning rate of 1 × 10-4. We trained the intensity-balance network and structure-separation network for 5000 epochs. We saved the models for every 10 epochs. The representative plots of validation PSNR, NRMSE, and SSIM during the training processes of different networks are shown in Supplementary Fig. 2 and Supplementary Fig. 3. The codes for training and testing were written using Python with PyTorch framework. The way to calculate the value of PSNR, NRMSE, and SSIM is similar to our previous work [7]. All the source codes will be available online (https://github.com/luhongjinzju/DBSN).
Data availability
All relevant data are available within the article and the Source Data. The training datasets are available from the corresponding author upon request, due to size limitations. The source code for pyTorch and part of the source image data were uploaded to GitHub: https://github.com/luhongjinzju/DBSN. Other relevant data are available from the corresponding author upon request, due to size limits.
Author contributions
L.J. and Y.X. conceived the project. L.J., J.L., and H.Z. performed the imaging and figures. L.J. developed the code with the help of Y.Z., H.Y., and J.W. H.Z., J.L., and L.Z. prepared the biological samples. The manuscript was written with input from all authors, starting from a draft provided by L.J. Y.X. supervised the project.
Competing interests
The authors declare no competing interests.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (62105288 and 22104129), the National Key Research and Development Program of China (2021YFF0700305 and 2018YFE0119000), the Zhejiang Provincial Natural Science Foundation (LQ22F050018) and Fellowship of China Postdoctoral Science Foundation (2021M692831).