A real-time compression library for microscopy images

Fluorescence imaging techniques such as single molecule localization microscopy, high-content screening and light-sheet microscopy are producing ever-larger datasets, which poses increasing challenges in data handling and data sharing. Here, we introduce a real-time compression library that allows for very fast (beyond 1 GB/s) compression and de-compression of microscopy datasets during acquisition. In addition to an efficient lossless mode, our algorithm also includes a lossy option, which limits pixel deviations to the intrinsic noise level of the image and yields compression ratio of up to 100-fold. We present a detailed performance analysis of the different compression modes for various biological samples and imaging modalities.


Main
Advancements in fluorescence microscopy technologies such as in high-content screening [1]- [3], light-sheet microscopy [4]- [8], and single molecule localization microscopy [9]- [11] opened new perspectives in biology by increasing the speed of imaging, the number of specimens or the resolution of the observed structures. Even though these methods bring undeniable advantages, the data production speed and experiment sizes (Fig. 1a, Supplementary Table 1) are increasing in such a fast pace that in many cases data handling quickly becomes a bottleneck for new discoveries [12]- [14]. A straightforward solution to this problem is to perform image compression. Nonetheless, this typically implies incompatibilities with certain software packages, slow compression speed, and only moderate file size reduction for lossless methods. Although the compression ratio (original size / compressed size) can be substantially increased with lossy compression algorithms, their use is often discouraged [15] as the degree of information loss heavily depends on the image content and cannot be explicitly controlled.
To address these challenges, we developed a new compression library called B 3 D, which is capable of extremely fast compression and decompression of large microscopy datasets. Our library is built on the CUDA architecture [16] for GPU-based compression, which not only enables high processing speed, but also relieves load on the central processing unit, allowing compression directly during image acquisition. The algorithm has two main components. First, a prediction is made for each pixel based on the neighboring pixel values, and second, the prediction errors are run-length and Huffman encoded to effectively reduce the data size (Supplementary Note and Supplementary Fig. 1). We compared our algorithm's performance with TIFF (LZW), JPEG2000, and the speed-optimized KLB [17] by measuring compression speed, decompression speed and resulting file size (Fig. 1b). Only B³D is capable of handling the sustained high data rate of modern sCMOS cameras typically used in light-sheet microscopy, while still maintaining compression ratios comparable to more complex, but much slower algorithms (Supplementary Table 2).
Lossless compression is fundamentally limited by the algorithmic entropy of the data, and further reduction in size is only possible through loss of information. Most lossy compression algorithms (such as JPEG) compress images by preserving only those structures recognized by the human visual system [18], making compression artifacts imperceptible to the eye. Scientific images, however, represent sets of quantitative measurements and therefore their compression should instead preserve the numerical values of all pixel intensities within their uncertainties. Pixel values are subject to random noise that mainly consist of the photon shot noise and the camera read noise. Commonly used lossy compression algorithms often perform a quantization step in Fourier or wavelet space. Therefore, the deviation introduced by the compression at the single pixel level cannot be controlled. We developed a noise dependent lossy compression algorithm which processes each pixel individually and exploits the natural variability of pixel values. As a result, the user can specify the maximally tolerated pixel error in proportions of the inherent noise. Extending our lossless compression scheme, this is achieved by adding a variance stabilization step before the prediction, and quantizing the prediction residuals before the Huffman coding (Supplementary Note). Alternatively, the quantization and prediction steps can be swapped, which can be more suitable for methods that are sensitive to small scale pixel correlations, such as localization microscopy, albeit at the expense of compression ratio (Supplementary Fig. 2). We define the mode where the quantization step is equal to the noise (q=1σ) as within noise level (WNL) compression. Using this method, the compression ratio massively increases for all imaging modalities compared to the lossless mode ( Fig. 1c) without any apparent loss in image quality (Supplementary Fig. 3). Furthermore, the average compression error is considerably smaller than the image noise itself (Supplementary Fig. 4).
To see how this noise-dependent compression affects common imaging pipelines, we tested the effect of different levels of compression on 3D nucleus and membrane segmentation in light-sheet microscopy, and on single-molecule localization accuracy in superresolution microscopy. First, we imaged a Drosophila melanogaster embryo expressing an H2Av-mCherry nuclear marker in a MuVi-SPIM setup [6] and segmented the nuclei ( Fig. 2a and Online Methods). Then we performed noise dependent compression at various quality levels and calculated the segmentation overlap compared to the uncompressed stack (Online Methods). At WNL compression (q=1σ) the segmentation overlap is almost perfect ( Fig. 2b) with an overlap score of 0.996. Even when increasing the quantization step to 4σ (Fig. 2c) the overlap score stays at 0.98 and only drops below 0.97 when the compression ratio is already above 120 (quantization step of 5σ, Fig. 2d). We got similar results for a membrane segmentation pipeline that is used with Phallusia mammillata embryos (Online Methods and Supplementary Fig. 5).
Next, we evaluated our compression algorithm in the context of single molecule localization, and measured how the localization precision is affected by an increasing compression ratio. We compressed a single-molecule localization microscopy (SMLM) dataset of immunodetected microtubules (Fig. 2e) with increasing compression levels. For WNL compression (q=1σ) no deterioration of the image was visible (Fig. 2f), and even for the case of q=4σ the compression induced errors were much smaller than the resolvable features (Fig. 2g). To quantify the impact of compression on the localization error, we used a simulated dataset (Online Methods) and compared the localization output of different compression levels to the ground truth (Fig. 2h). Lossless compression resulted in a compression ratio of 2.7, whereas WNL compression reached a compression ratio of 5.0, while increasing the localization error by only 4%. This also coincides with the theoretical increase of image noise (Supplementary Note). Furthermore, the increase in localization error was not dependent on the signal to background noise ratio (Supplementary Fig. 6).
Our algorithm is implemented in C++, and allows for easy integration through an API with various programming languages. The library was tested on Linux (Ubuntu 16.04) and Windows (10). Additionally, we implemented a filter plugin for HDF5 which enables a seamless integration in all software packages that are supporting the native HDF5 library, such as Matlab, Python, Imaris, or Ilastik. Because of its versatility, HDF5 has emerged as the de facto standard in the open source light sheet microscopy field, and is also the basis for the widely used BigDataViewer [19] in Fiji [20]. When loading a B 3 D compressed image in an HDF5 enabled application, the library automatically calls our filter plugin, decompresses the image on the GPU, and copies it back into CPU memory. Due to B 3 D's efficient compression and its high decompression speed, loading data is often accelerated: For a state-of-the-art hard drive with 200 MB/s bandwidth, loading a 2 GB uncompressed 3D stack of images takes about 10 seconds. With an average compression ratio of 20 fold in the WNL mode, the loading time is reduced to 0.5 seconds followed by 2 seconds of decompression, which yields a factor of four speed-up. It is also worth to note, that the achieved WNL compression reduces the camera data rate to below 40 MB/s, well below the 1 Gb/s Ethernet standard. This enables to use current network infrastructure to move data to long term storage and even makes the use of cloud services possible. Altogether, B 3 D, our efficient GPU-based image compression library allows for exceptionally fast compression speed and greatly increases compression ratio with its WNL scheme, offering a versatile tool that can be easily tailored to any high-speed microscopy environment.   Table 3).

Figure 2. Effect of noise dependent lossy compression on image analysis outcome.
(a-h) Influence of noise dependent lossy compression on 3D nucleus segmentation. A Drosophila m. embryo expressing H2Av-mCherry nuclear marker was imaged in MuVi-SPIM [7], and 3D nucleus segmentation was performed (Online Methods) (a). The raw data was subsequently compressed at increasingly higher compression levels, and segmented based on the training of the uncompressed data. To visualize segmentation mismatch, the results of the uncompressed (green) and compressed (magenta) datasets are overlaid in a single image (b, c; overlap in white). Representative compression levels were chosen at two different multiples of the photon shot noise, at q=1σ (b) and q=4σ (c). For all compression levels the segmentation overlap score (Online Methods) was calculated and is plotted in (g) along with the achieved compression ratios. (e-h) Influence of noise dependent lossy compression on single-molecule localization. Microtubules, immunolabeled with Alexa Fluor 647 were imaged by SMLM (h). The raw data was compressed at increasingly higher compression levels, and localized using the same settings as the uncompressed data. To visualize localization mismatch, the results of the uncompressed (green) and compressed (magenta) datasets are overlaid in a single image (f, g; overlap in white). Two representative compression levels were chosen at q=1σ (f) and q=4σ (g). To assess the effects of compression on localization precision, a simulated dataset with known emitter positions was compressed at various levels. For all compression levels the relative localization error (normalized to the Cramér-Rao lower bound) was calculated and is plotted in (h) along with the achieved compression factors.

Compression benchmarking
For all presented benchmarks, TIFF and JPEG2000 performance was measured through MATLAB's imwrite and imread functions, while KLB and B³D performance was measured in C++. All benchmarks were run on a computer featuring 32 processing cores (2×Intel Xeon E5-2620), 128 GB RAM and an NVIDIA GeForce GTX 970 graphics processing unit. Read and write measurements were performed in RAM to minimize I/O overhead, and are an average of 5 runs.

Light-sheet imaging
Drosophila melanogaster embryos were imaged in our MuVi-SPIM setup [6] using the electronic confocal slit detection (eCSD) [21]. Embryos were collected on an agar juice plate, and dechorionated in 50% bleach solution for 1 min. The embryos were then mounted in a shortened glass capillary (Brand 100 μl) filled with 0.8% GelRite (Sigma-Aldrich), and pushed out of the capillary to be supported only by the gel.

3D nucleus segmentation
3D nucleus segmentation of Drosophila m. embryos was performed using Ilastik [22]. The original dataset was compressed at different quantization levels, then upscaled in z to obtain isotropic resolution. To identify the nuclei, we used the pixel classification workflow, and trained it on the uncompressed dataset. This training was then used to segment the compressed datasets as well. Segmentation overlap was calculated in Matlab (Supplementary Code) using the Sørensen-Dice index [23], [24]: where the sets and represent the pixels included in two different segmentations.

3D membrane segmentation
Raw MuVi-SPIM recordings of Phallusia mammillata embryos expressing PH-citrine membrane marker were kindly provided by Ulla-Maj Fiuza (EMBL, Heidelberg). Each recording consisted of 4 views at 90 degree rotations. The views were fused using an image based registration algorithm followed by a sigmoidal blending of the 4 views. The fused stack was then segmented using the MARS algorithm [25] with an hmin parameter of 10. The raw data (all 4 views) was compressed at different levels, and segmented using the same pipeline. Segmentation results were then processed in Matlab to calculate the overlap score for the membranes using the Sørensen-Dice index (Supplementary Code).

Single-molecule localization imaging
In order to visualize microtubules, U2OS cells were treated as in [26] and imaged in a dSTORM buffer [27]. In brief, the cells were permeabilized and fixed with glutaraldehyde, washed, then incubated with primary tubulin antibodies and finally stained with Alexa Fluor 647 coupled secondary antibodies. The images were recorded on a home-built microscope previously described [26], in its 2D single-channel mode.

Single-molecule localization data analysis
Analysis of single-molecule localization data was performed on a custom-written MATLAB software as in [28]. Pixel values were converted to photon counts according to measured offset and calibrated gain of the camera (EMCCD iXon, Andor). The background was estimated with a wavelet filter [29], background-subtracted images were thresholded and local maxima were detected on the same images. 7-pixel ROIs around the detected local maxima were extracted from the raw images and fitted with a GPU based MLE fitter [30]. Drift correction was performed based on cross-correlation. Finally, images were reconstructed by filtering out localizations with a high uncertainty (>30 nm) and large PSF (>150 nm) and Gaussian rendering.

Simulation of single-molecule localization data
Single molecule localization data was simulated in Matlab (Supplementary Code) by generating a grid of pixelated Gaussian spots with standard deviation of 1 pixel. With a pixel size of a 100 nm, this corresponds to a FWHM of 235.48 nm. The center of each spot was slightly offset from the pixel grid at 0.1 pixel increments in both x and y directions. To this ground truth image a constant value was added for illumination background, and finally Poisson noise was applied to the image. This process was repeated 10000 times to obtain enough images for adequate accuracy.

Code availability
Source code and compiled binaries, including a filter plugin for HDF5, will be made available for download at https://git.embl.de/balazs/B3D. For immediate access, please send a request to balint.balazs@embl.de.