Abstract
Fluorescence microscopy is a key driver of discoveries in the life-sciences, with observable phenomena being limited by the optics of the microscope, the chemistry of the fluorophores, and the maximum photon exposure tolerated by the sample. These limits necessitate trade-offs between imaging speed, spatial resolution, light exposure, and imaging depth. In this work we show how deep learning enables biological observations beyond the physical limitations of microscopes. On seven concrete examples we illustrate how microscopy images can be restored even if 60-fold fewer photons are used during acquisition, how isotropic resolution can be achieved even with a 10-fold under-sampling along the axial direction, and how diffraction-limited structures can be resolved at 20-times higher frame-rates compared to state-of-the-art methods. All developed image restoration methods are freely available as open source software.
1 Introduction
Fluorescence microscopy is an indispensable tool in the life sciences for investigating the spatio-temporal dynamics of cells, tissues, and developing organisms. Recent advances, such as light-sheet microscopy [1–3], structured illumination microscopy [4, 5], and super-resolution microscopy [6–8] enable time resolved volumetric imaging of biological processes within cells at high resolution. The quality at which these processes can be faithfully recorded, however, is not only determined by the spatial resolution of the used optical device, but also by the desired temporal resolution, the total duration of an experiment, the required imaging depth, the achievable fluorophore density, bleaching, and photo-toxicity [9, 10]. These aspects cannot all be optimized at the same time – one must make trade-offs, for example, sacrificing signal-to-noise ratio by reducing exposure time in order to gain imaging speed. Such trade-offs are often depicted by a design-space tetrahedron [11] that has resolution, speed, phototoxicity, and depth at its four vertices (Figure 1a) with the volume being limited by a total photon budget [12].
These trade-offs can be addressed by optimizing the microscopy hardware, however, there are physical limits that cannot easily be overcome. Therefore, computational procedures to improve the quality of acquired microscopy images are becoming increasingly important. For instance, in the above-mentioned trade-off between exposure and speed, one could apply computational image restoration to maintain an image quality that is still sufficient for downstream data quantification at high acquisition speed. Super resolution microscopy [4, 13–16], deconvolution [17–19], surface projection algorithms [20, 21], and denoising methods [22–24] are examples of sophisticated image restoration algorithms that can push the limit of the design-space tetrahedron, and thus allow one to recover important biological information that would be inaccessible by imaging alone.
Most common image restoration problems, however, have multiple possible solutions, and require additional assumptions in order to select one solution as the final restoration. These assumptions are typically general, e.g. requiring certain level of smoothness of the restored image, and therefore are not dependent on the specific content of the images to be restored. Intuitively, a method that leverages available knowledge about the data at hand ought to reach superior restoration results.
Deep Learning (DL) is such a method, since it can learn to perform complex tasks on specific data [25, 26]. It employs large multi-layered neural networks that compute results after being trained on annotated example data (i.e. gold-standard, ground-truth data). Spectacular results reaching human-level performance have for example been achieved on the classification of natural images [27]. In biology, DL methods have for instance been applied to the automatic extraction of connectomes from large electron microscopy data [28] and for classification of image-based high-content screens [29]. However, the direct application of DL methods to image restoration tasks in fluorescence microscopy is complicated by the absence of sufficiently large training data sets. In the context of fluorescence microscopy, manually generating such datasets would require an inordinate amount of careful expert annotations and is therefore simply not feasible.
In this paper we present a solution to the problem of missing training data for DL in fluorescence microscopy by developing strategies to generate training data without the need for manual annotation. This enables us to apply neural networks to image restoration tasks, such as image denoising, surface projection, recovery of isotropic resolution, and the restoration of sub-diffraction structures. We show, in a variety of imaging scenarios, that trained content-aware restoration (care) networks produce results that were previously unobtainable. This means that the application of care to biological images allows to transcend the limitations of the design-space tetrahedron (Figure 1a), pushing the limits of the possible in fluorescence microscopy through machine learned image computation.
2 Results
In fluorescence microscopy one is often forced to image samples at low signal intensities, resulting in difficult to analyze, low signal-to-noise ratio (SNR). One way to improve SNR is to increase laser power or exposure times which, unfortunately, is usually detrimental to the sample, limiting the possible duration of the recording and introducing artifacts due to photo-damage. An alternative solution is to image at low SNR, and later computationally restore acquired images. Classical approaches, such as Non-local-means denoising [22], can in principle achieve this, but without leveraging the available knowledge about the data at hand.
To address this problem with machine learning, we developed content-aware image restoration (care) networks, adapted to a specific experimental setup, hypothesizing that they produce results superior to classical, content-agnostic methods. In the case of image denoising, we acquired pairs of images at low and high signal-to-noise ratios, used them as input and ground-truth to train CARE networks, and applied the trained networks to remove noise in previously unseen data.
Image Restoration with Physically Acquired Training Data
To demonstrate the power of this approach in biology, we applied it to the imaging of the flatworm Schmidtea mediterranea, a model organism for studying tissue regeneration. This organism is exceptionally sensitive to even moderate amounts of laser light [30], suffering muscle flinching at desirable illumination levels even when anesthetized (Supp. Video 1). Using a laser power that reduces flinching to an acceptable level results in images with such low SNR that they are impossible to interpret directly. Consequently, live imaging of S. mediterranea has thus far been intractable.
To address this problem with care, we imaged fixed worm samples at several laser intensities. We acquired well-registered pairs of images, a low-SNR image at laser power compatible with live imaging, and a high-SNR image, serving as ground-truth. We then trained a convolutional neural network1 and applied the trained network to previously unseen live imaging data of S. mediterranea. We consistently obtained high quality restorations, even if the SNR of the images was very low, e.g. being acquired with a 60-fold reduced light-dosage (Figure 1c, Supp. Video 2, Supp. Figure 1-3). To quantify this observation, we measured the restoration error between prediction and ground-truth images for three different exposure and laser-power conditions. Both, the NRMSE2 and the SSIM3 measures of error improved considerably when compared to results obtained by a potent baseline denoising method (Figure 1d, Supp. Figure 2). Moreover, while training a CARE network can take several hours, the restoration time for a volume of size 1024 × 1024 × 100 was less than 20 seconds on a single graphics processing unit4. In this case, care networks are able to take input data that are unusable for biological investigations and turn them into high-quality time-lapse data, providing the first practical framework for live-cell imaging of S. mediterranea.
We next asked whether CARE improves common downstream analysis tasks in live-cell imaging, such as nuclei segmentation. We used light-sheet recordings of developing Tribolium castaneum (red flour beetle) embryos, and as before trained a network on image pairs of samples acquired at high and low laser powers (Figure 1e). The resulting CARE network performs well even on extremely noisy, previously unseen live-imaging data, acquired with up to 70-fold reduced light-dosage compared to typical imaging protocols [34] (Supp. Notes 4, Supp. Video 3, Supp. Figure 4). In order to test the benefits of care for segmentation, we applied a simple nuclei segmentation pipeline to raw and restored image stacks of T. castaneum. The results show that compared to manual expert segmentation, the segmentation accuracy (as measured with the standard Seg score [35]) improved from Seg = 0.47 on the classically denoised raw stacks to Seg = 0.65 on the care restored volumes (Supp. Figure 5). Since this segmentation performance is achieved at significantly reduced laser power, the gained photon budget can now be spent on the imaging speed and light-exposure dimensions of the design-space tetrahedron. This means that Tribolium embryos, when restored with care, can be imaged longer and at higher frame rates, enabling improved tracking of cell lineages.
Encouraged by the performance of care on two independent denoising tasks, we asked whether such networks can also solve more complex, composite tasks. In biology it is often useful to image a 3D volume and project it to a 2D surface for analysis, for example when studying cell behavior in developing epithelia of the fruit fly Drosophila melanogaster [36–38]. Also in this context, it is beneficial to optimize the trade-off between laser-power and imaging speed, usually resulting in rather low-SNR images. Thus, this restoration problem is composed of projection and denoising, presenting the opportunity to test if CARE networks can deal with such composite tasks.
For training, we again acquired pairs of low and high SNR 3D image stacks, and further generated 2D projection images from the high SNR stacks [20] that serve as ground-truth (Figure 2a). We developed a task-specific network architecture that consists of two jointly trained parts: a network for surface projection, followed by a network for image denoising (Figure 2b, Supp. Figure 9 and Supp. Notes 2). The results show that with care, reducing light dosage up to 10-fold has virtually no adverse effect on the quality of segmentation and tracking results obtained on the projected 2D images with an established analysis pipeline [39] (Figure. 2 c & d, Supp. Video 4, and Supp. Figure 7 & 8). Even for this complex task, the gained photon-budget can be used to move beyond the design-space tetrahedron, for example by increasing temporal resolution, and consequently improving the precision of tracking of cell behaviors during wing morphogenesis [39].
Image Restoration with Semi-synthetic Training Data
Thus far, the application of care has relied on the availability of matching pairs of high and low quality images, both physically acquired at a microscope. However, this kind of data is not always available. Therefore, we investigated whether image pairs useful for training can be obtained also by computationally modifying existing microscopy images.
A common problem in fluorescence microscopy is that the axial resolution of volumetric acquisitions is significantly lower than the lateral resolution5. This anisotropy compromises the ability to accurately measure properties such as the shapes or volumes of cells. Anisotropy is caused by the inherent axial elongation of the optical point spread function (PSF), and the often low axial sampling rate of volumetric acquisitions necessitated by the requirement to image fast.
For the restoration of anisotropic image resolution, adequate pairs of training data cannot directly be acquired at the microscope. Rather, we took well-resolved lateral slices as ground truth, and computationally modified them (i.e. applied a realistic imaging model, Supp. Notes 2) to resemble anisotropic axial slices of the same image stack. In this way, we generated matching pairs of images showing the same content at axial and lateral resolutions. These semi-synthetically generated pairs are suitable to train a CARE network that then restores previously unseen axial slices to nearly isotropic resolution (Figure 3a, Supp. Figure 15, Supp. Notes 2, and [40, 41]). In order to restore entire anisotropic volumes, we applied the trained network to all lateral image slices, taken in two orthogonal directions, averaged to a single isotropic restoration (Supp. Notes 2).
We applied this strategy to increase axial resolution of acquired volumes of fruit fly embryos [42], zebrafish retina [43], and mouse liver, imaged with different fluorescence imaging techniques. The results show that CARE improved the axial resolution in all three cases considerably (Figure 3b-d, Supp. Video 5 & 6, and Supp. Figure 10 & 14). In order to quantify this, we performed Fourier-spectrum analysis of Drosophila volumes before and after restoration, and showed that the frequencies along the axial dimension are fully restored, while frequencies along the lateral dimensions remain unchanged (Supp. Figure 11). Since the purpose of the fruit fly data is to segment and track nuclei, we applied a common segmentation pipeline [44] to the raw and restored images, and observed that the fraction of incorrectly identified nuclei was lowered from 1.7% to 0.2% (Supp. Notes 2, Supp. Figure 12 & 13). Thus, restoring anisotropic volumetric embryo images to effectively isotropic stacks, leads to improved segmentation, and will enable more reliable extraction of developmental lineages.
The zebrafish and mouse liver data are examples of live and fixed two-channel imaging of large organs, both requiring high imaging speed and isotropic resolution for downstream analysis. While isotropy facilitates segmentation and subsequent quantification of shapes and volumes of cells, vessels, or other biological objects of interest, higher imaging speed enables imaging of larger volumes and their tracking over time. Indeed, respective care networks deliver the desired axial resolution with up to 10-fold fewer axial slices (Figure 3 c & d), allowing one to reach comparable results ten times faster. Moreover, we observed that for these two-channel datasets, the network learned to exploit correlations between channels, leading to a better overall restoration quality compared to results based on individual channels (Supp. Figure 14).
Taken together, increasing isotropic resolution through CARE networks, trained on semi-synthetic pairs of images, benefits both imaging speed and accuracy of downstream analysis in many biological applications. Moreover, since training data can computationally be derived from the data to be restored, this method can be applied to any already acquired data set.
Image Restoration with Synthetic Training Data
Having seen the potential of using semisynthetic training data for CARE, we next investigated whether reasonable restorations can be achieved from synthetic image data alone, i.e. without involving real microscopy data during training.
In most of the previous applications, one of the main benefits of CARE networks was improved imaging speed. Many biological applications additionally require resolving sub-diffraction structures in the context of live-cell imaging. Super-resolution imaging modalities achieve the necessary resolution, but suffer from low acquisition rates. On the other hand, widefield imaging offers the necessary speed, but lacks the required resolution. We tested whether CARE can computationally resolve sub-diffraction structures using only widefield images as input. To this end, we developed synthetic generative models of tubular and point-like structures that are commonly studied in biology. In order to obtain synthetic image pairs, suitable for training CARE networks, we used these generated structures as ground-truth, and computationally modified them to resemble actual microscopy data (Supp. Notes 2, Supp. Figure 17). We then used the trained networks to enhance widefield microscopy images containing tubular and point-like structures.
Specifically, we created synthetic ground-truth images of tubular meshes resembling microtubules, and point-like structures of various sizes mimicking secretory granules. Then we computed synthetic input images by simulating the image degradation process by applying a PSF, camera noise, and background auto-fluorescence (Figure 4a, Supp. Notes 2, and Supp. Figure 17). Finally, we trained a CARE network on these generated image pairs, and applied it to 2-channel widefield time-lapse images of rat INS-1 cells where the secretory granules and the microtubules were labeled (Figure 4b). We observed that the restoration of both microtubules and secretory granules exhibit a dramatically improved resolution, revealing structures imperceptible in the widefield images (Supp. Video 7, and Supp. Figure 16). To substantiate this observation, we compared the CARE restoration to the results obtained by deconvolution6, which is commonly used to enhance widefield images (Figure 4b). Line profiles through the data show the improved performance of CARE network over deconvolution (Figure 4b). We additionally compared results obtained by CARE with super-resolution radial fluctuations (SRRF, [14]), a state-of-the-art method for reconstructing super-resolution from widefield time-lapse images. We applied both methods on time-lapse wide-field images of GFP-tagged microtubules in HeLa cells. The results show that both CARE and SRRF are able to resolve qualitatively similar microtubular structures (Figures 4c, Supp. Video 8). However, CARE reconstructions are at least 20 times faster, since they are computed from a single average of up to 10 consecutive raw images while SRRF required about 200 consecutive widefield frames.
Taken together, these results suggest that for structures that are straight-forward to model, such as microtubules, CARE networks can enhance widefield images to a resolution usually only obtainable with super-resolution microscopy, yet at considerably higher frame rates.
Reliability of Image Restoration
We have shown that with the right training data, care networks perform remarkably well on a wide range of image restoration tasks, opening new avenues for biological observations. However, as for any image processing method, the issue of reliability of results needs to be addressed.
To facilitate the evaluation of reliability of CARE network predictions, we designed them to predict a probability distribution for each pixel (Figure 5a). This distinguishes them from conventional image restoration approaches such as deconvolution [45, 46], where only a single restored intensity value is computed per pixel. For CARE networks, the mean of the distribution is used as the restored pixel value, while the width (variance) of each pixel distribution encodes the uncertainty of pixel predictions. Intuitively, narrow distributions signify high confidence, whereas broad distributions indicate low confidence pixel predictions. This allows us to provide per-pixel confidence intervals of the restored image (Figure 5a, and Supp. Figure 18 & 19).
These confidence intervals carry information about the reliability of CARE network predictions. We observed that variances tend to increase with restored pixel intensities. This makes it hard to intuitively understand which areas of an restored image are reliable or unreliable from a static image of per-pixel variances. Therefore, we visualize the uncertainty in short video sequences, where pixel intensities are randomly sampled from their respective distributions (Supp. Video 9). To a human observer, strong flicker in such videos highlights the areas where the uncertainty of image restorations is high.
In the context of machine learning the accuracy can often be increased by aggregating several trained predictors [47]. In addition, we reasoned that by analyzing the consistency of network predictions we can assess their reliability. To that end, we train ensembles (Figure 5b) of about 5 CARE networks on randomized sequences of the same training data. We introduced a measure that quantifies the ensemble disagreement per pixel (Supp. Notes 3). takes values between 0 and 1, with higher values signifying larger disagreement, i.e. smaller overlap among the distributions predicted by the networks in the ensemble. Using fly wing denoising as an example, we observed that in areas where different networks in an ensemble predicted very similar structures, the disagreement measure was low (Figure 5c, top row), whereas in areas where the same networks predicted obviously dissimilar solutions, the corresponding values of were large (Figure 5c, bottom row). Therefore, training ensembles of care networks is useful to detect problematic image areas that cannot reliably be restored7.
Availability of Proposed Methods
Code for network training and prediction (written in Python using Keras [48] and TensorFlow [49]) is publicly available8. Furthermore, to make our restoration models readily available, we developed user-friendly Fiji plugins and Knime workflows (Supp. Figure 21 & 22).
3 Discussion
We have introduced content-aware image restoration (care) networks designed to restore fluorescence microscopy data. A key feature of our approach is that generating training data does not require laborious manual annotations. Application of CARE to raw images significantly expands the realm of observable biological phenomena. With care, flatworms can be imaged without unwanted muscle contractions, beetle embryos can be imaged much gentler and therefore longer and faster, large tiled scans of entire Drosophila wings can be imaged and simultaneously projected at dramatically increased temporal resolution, isotropic restorations of embryos and large organs can be computed from existing anisotropic data, and sub-diffraction structures can be restored from widefield systems at high frame rate. In all these examples, care allows one to invest the photon budget saved during imaging into improvement of acquisition parameters relevant for a given biological problem, such as speed of imaging, photo-toxicity, isotropy, or resolution.
Whether an experimentalist is willing to make the above mentioned investment, depends on her trust that a care network is accurately restoring the image. This is a valid concern, that applies to any image restoration approach. What sets care apart is the availability of additional readouts – per-pixel confidence intervals and ensemble disagreement scores. While strong disagreement indicates untrustworthy predictions, the converse is not necessarily true since all networks could simply make the same or similar mistakes. Still, the proposed disagreement score allows users to identify image regions where restorations might not be accurate.
We have shown multiple examples where image restoration with CARE networks positively impacts downstream image analysis, such as segmentation and tracking needed for extracting developmental lineages. Interestingly, in the case of Tribolium, care improved segmentation by efficient denoising, whereas in the case of Drosophila, the segmentation was improved by increasing the isotropy of volumetric acquisitions. These two benefits are not mutually exclusive and could very well be combined. In fact, we have shown on data from developing Drosophila wings, that composite tasks can jointly be trained. Future explorations of jointly training composite networks will further broaden the applicability of CARE to complex biological imaging problems.
Yet, care networks cannot be applied to all existing image restoration problems. For instance, the proposed isotropic restoration relies on the implicit assumption that the PSF is constant throughout the image volume, which often is not the case deep inside tissues. Additionally, care is not feasible when imaging novel biological structures for which ground-truth can neither be physically acquired nor synthetically modeled, even in principle. The synthetic generation of training data could, in general, benefit from recent advances in computer vision, such as generative adversarial networks [50]. Furthermore, the disagreement score we introduced could be used to identify instances when synthetic data are not accurate enough. This will allow to iteratively adjust the modeling of synthetically generated biological structures and in turn improve CARE restorations.
Overall, our results show that fluorescence microscopes can, in combination with content-aware restorations, operate at higher frame-rates, shorter exposures, and lower light intensities, while reaching higher resolution, and thereby improving downstream analysis. The technology described here is readily accessible to the scientific community through the open source tools we provide. We predict that the current explosion of image data diversity and the ability of CARE networks to automatically adapt to various image contents, will make such learning approaches prevalent for biological image restoration and will open new windows into the inner workings of biological systems across scales.
Author contributions
M.W. and L.R. initiated the research. M.W. and U.S. designed and implemented the training and validation methods. U.S., M.W., and F.J. designed and implemented the uncertainty readouts. T.B., A.M., A.D., S.C., F.S.M., R.H., M.R.M., and A.J. collected experimental data. A.D., C.B., and F.J. performed cell segmentation analysis. T.B. performed analysis on flatworm data. F.J., B.W., and D.S. designed and developed the Fiji and Knime integration. E.W.M supervised the project. F.J., M.W., P.T., L.R., U.S., & E.W.M wrote the manuscript, with input from all authors.
Acknowledgements
The authors want to thank Philipp Keller (Janelia) who provided Drosophila data. We thank Suzanne Eaton (MPI-CBG), Franz Gruber and Romina Piscitello for sharing the expertise in fly imaging and providing fly lines. We thank Anke Sönmez for cell culture work. We thank Marija Matejcic (MPI-CBG) for generating and sharing the LAP2B transgenic line Tg(bactin:eGFP-LAP2B). We thank the following Services and Facilities of the MPI-CBG for their support: Computer Department, Light Microscopy Facility (LMF) and Fish Facility. This work was supported by the German Federal Ministry of Research and Education (BMBF) under the codes 031L0102 (de.NBI) and 031L0044 (Sysbio II). M.S. was supported by the German Center for Diabetes Research (DZD e.V.). R.H. and S.C. was supported grants from the UK BBSRC (BB/M022374/1; BB/P027431/1; BB/R000697/1), UK MRC (MR/K015826/1) and Wellcome Trust (203276/Z/16/Z).
Footnotes
↵† Shared last authors
↵1 The network architecture we used is based on [31, 32] (Supp. Figure 6 and Supp. Notes 2).
↵2 Normalized root-mean-square error.
↵3 Structural similarity index, measuring the perceived similarity between two images [33].
↵4 We used a Nvidia Titan X GPU for all presented experiments.
↵5 Modalities that would allow (close to) isotropic acquisitions are rare, e.g. multi-view light-sheet microscopy [19].
↵6 We used the on-board deconvolution procedure shipped with the DeltaVision OMX microscope.
↵7 Another example for the utility of ensemble disagreement can be found in Supp. Figure 20.