Rapid and label-free identification of individual bacterial pathogens exploiting three-dimensional quantitative phase imaging and deep learning

For appropriate treatments of infectious diseases, rapid identification of the pathogens is crucial. Here, we developed a rapid and label-free method for identifying common bacterial pathogens as individual bacteria by using three-dimensional quantitative phase imaging and deep learning. We achieved 95% accuracy in classifying 19 bacterial species by exploiting the rich information in three-dimensional refractive index tomograms with a convolutional neural network classifier. Extensive analysis of the features extracted by the trained classifier was carried out, which supported that our classifier is capable of learning species-dependent characteristics. We also confirmed that utilizing three-dimensional refractive index tomograms was crucial for identification ability compared to two-dimensional imaging. This method, which does not require time-consuming culture, shows high feasibility for diagnosing patients with infectious diseases who would benefit from immediate and adequate antibiotic treatment.


MAIN
In recent years, the global society has been frequently exposed to the threat of infectious diseases, i.e., disorders caused by the invasion of microbes 1, 2 . Climate change and the expansion of human activities around the globe have allowed the spread of parasites from different sources. Despite the recent advances in biomedical techniques, the global society has paid significant tolls including public fear 3 and economic impact 4 , following the outbreaks of infectious diseases.
Among different groups of infectious microbes, bacteria have played a notable part in jeopardizing the public health 1, 5 . Tuberculosis, cholera, meningitis and numerous diseases have repeatedly emerged, giving striking blows to the international community. Furthermore, bacterial infections are the most common cause of sepsis, which has been indicated as a major cause of deaths world-wide 6,7 .
Identification of the pathogens can be instrumental in providing appropriate antibiotic treatments against bacterial infections. While bacterial infection can be effectively treated by prescribing appropriate antibiotic agents, the efficacy of antibiotics varies for different types of bacteria due to their different mechanisms [8][9][10][11] . The risks accompanying inappropriate antibiotic treatments further highlight the demand for identification of the pathogens [12][13][14][15] . Accordingly, multiple methods have been proposed to rapidly identify the microbial pathogens in bloodstream [16][17][18][19][20][21] .
Typical methods for identifying the microbial pathogens in bloodstream involve blood culture and matrixassisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) 22,23 . While MALDI-TOF MS offers the chemical composition of the specimen, it requires amplification of the microbial cells to 10 5 or more cells. This prerequisite restricts the speed of identification using MALDI-TOF MS. In addition, MALDI-TOF MS alone may fail to identify mixtures of pathogens that fall under multiple species 24 .
To overcome the problems related to routine microbiological identification methods, we developed a rapid and accurate method for identifying bacterial species based on 3D (three-dimensional) quantitative phase imaging (QPI) and a convolutional neural network (CNN). After establishing a 3D refractive index (RI) tomogram database of 19 bacterial species, we trained a CNN classifier to predict species based on 3D RI tomograms of individual bacteria and provide interpretability. After training, the performance of the classifier was assessed using an unseen dataset. The trained classifier accurately predicted the species as well as gram-stainability or aerobicity. We confirmed that the classifier extracts species-dependent features in 3D RI tomograms according to the distribution of abstract features and saliency maps. Furthermore, the 3D context and robust contrast of 3D RI tomograms were crucial to the performance of the classifier.
The proposed approach was based on previous methods of identifying microscopic biospecimens based on QPI and machine learning [25][26][27][28][29][30][31][32][33] . QPI is a label-free holographic imaging technique used for measuring the optical phase delay of a specimen, which reveals the intrinsic biological properties of the specimen 34 . Based on its ability to rapidly and consistently characterize individual live cells, QPI provides suitable data for machine learning analysis.
Because of the challenges of detection and close relationship with pathology, bacteria have been used as subjects of identification based on QPI and machine learning. Previous studies achieved genus-level identification of individual bacteria based on 2D light scattering profiles 35 and conventional machine learning 26 , followed by the species-level screening of anthrax spores based on 2D phase images and a CNN 31 .
In this study, we aimed to strengthen the current countermeasure against infectious diseases by taking advantage of recent advances in QPI and deep learning 36 . 3D QPI techniques including optical diffraction tomography 37 (ODT) have expanded the understanding of live cells through 3D RI tomograms 34 . The usefulness of 3D RI tomograms in pathophysiological and immunological studies was verified by learning-based identification of blood cells 31,33 . By employing a state-of-the-art CNN 38 and advanced techniques for exploring abstract features in data 39,40 , we extracted intricate biological information from 3D RI tomograms.

RESULTS
In the proposed method, 3D RI tomograms of unknown specimens were classified into the most probable species based on a trained CNN classifier. The overall workflow of our approach is illustrated in Fig. 1. We established a 3D RI tomogram database of bacteria using QPI and ODT (Fig. 1a). Using QPI, phase and amplitude images of each specimen were measured in multiple illumination angles. Next, the 3D RI tomogram was reconstructed using the multi-angular set of phase and amplitude images. After establishing the 3D RI tomogram database of bacteria, the database was randomly separated into the training set, validation set, and test set. The CNN classifier was trained using the training set via gradient descent-based optimization. After selecting the best classifier for the validation set, classifier performance was tested on an unseen test set (Fig. 1b).
We acquired total 5041 3D RI tomograms of bacteria. To predict the clinical feasibility of our method, the database was composed of 19 bacterial species which were commonly isolated from clinical specimens at  Fig. 2a and common categorizations of each species are listed in Supplementary Table 1. Each tomogram covers a cuboid region of 7.2  7.2  4.0 m 3 , with one or more specimens of the labelled species inside the cuboid. The pixel size was 0.1, 0.1, and 0.2 m in the x, y, and z directions respectively. For each species, an identical number of tomograms was randomly assigned to each of the test set and validation set.

Identification of bacterial species from 3D RI tomograms
Our trained classifier was 94.6% accurate in the blind test of species classification (Fig. 2b). The average true positive rate (TPR), true negative rate (TNR), positive predictive value (PPV) and negative predictive value (NPV) for each species were 94.6%, 99.7%, 94.7%, and 99.7% respectively. The most frequent errors were misclassification of E. coli as K. pneumoniae and misclassification of S. pneumoniae as S. pyogenes, which occurred for 7.5% of E. coli and S. pneumoniae.
The high overall accuracy was achieved without significantly overlooking any species. The performance for each species is quantitatively listed in Supplementary Table. 2. TPR, or sensitivity, was the highest (100%) for H. influenzae, L. monocytogenes, P. aeruginosa, and S. anginosus, and the lowest (85%) for P. mirabilis. TNR, or specificity, was the highest (100%) for P. mirabilis, P. aeruginosa, and S. maltophilia, and the lowest (99.2%) for E. coli. PPV, or precision, was the highest (100%) for P. mirabilis, P. aeruginosa, and S. maltophilia, and the lowest (85.4%) for E. coli. NPV was the highest (100%) for H. influenzae, L. monocytogenes, P. aeruginosa, and S. agalactiae, and the lowest (99.2%) for P. mirabilis. Note that neither of TPR, TNR, PPV, nor PNV was lower than 85% for any of the 19 species.
The classifier showed low confusion between species in the same genus. Our database consists of 19 species in 15 genera, with two species belonging to Staphylococcus and four species belonging to Streptococcus. The classification was 95.5% accurate for these genera, which is slightly more accurate than for the species (Supplementary Fig. 1a). Genus-related errors were not increased compared to those among all 19 species. Once the genus was correctly predicted, the predictions within Staphylococcus and Streptococcus were 98.7% and 96.2% accurate respectively (Supplementary Fig. 1b).
The results of species classification were also evaluated by two additional types of categorization, gramstainability and aerobicity, which are closely related to the pathogenesis of infectious diseases and resistance to antibiotics [8][9][10][11] (Figs. 2c, d). The classification results were 97.5% and 98.2% accurate for gram-stainability and aerobicity respectively. Whereas an increase in accuracy for simpler classification tasks is not unexpected, other classification tasks related to treating the bacterial infection can be performed using the trained neural network.
Other categorization criteria including motility and morphological groups were accurately identified using the classifier (Supplementary Figs. 1c, d). Motile bacteria and non-motile bacteria were distinguished with 98.2% accuracy, although the 3D RI tomogram did not directly measure their movements. The 19 bacterial species were morphologically grouped into bacilli, cocci, and coccobacilli, reflecting rod-shaped bacteria, round bacteria, and intermediate shapes. The prediction among the three groups showed 98.0% accuracy.

Species-dependent features in 3D RI tomograms
To verify and interpret the feature-extraction capability of the trained classifier, we utilized techniques for interpreting CNN inferences. First, we investigated the distribution of latent features extracted by the classifier.
We mapped the high-dimensional feature space onto a 2D plane using t-distributed stochastic neighbour embedding 39 (t-SNE), which is a prominent unsupervised dimensionality reduction technique for deep learning (Fig. 3a). Features were extracted before the final fully-connected layer (See Supplementary Fig. 4).
Species-wise clustering in the feature space indicated that the convolutional filters were appropriately optimized for extracting species-dependent features, even for previously unconsidered tomograms in the test set. The distribution of misclassified data in the feature space was also highlighted in the visualization (marked using  signs). The feature vector of each misclassified data point was relatively distant from the correctly classified data of the identical species.
However, not all distant data were misclassified. Correct classifications were occasionally made even though the extracted feature was distant from other data of the identical species. This indicates that the final fullyconnected layer was optimized to track the complex decision boundary from the incompletely clustered feature space.
The relationship between the feature distribution and gram-stainability/aerobicity was also investigated by t-SNE (Fig. 3b). Neither gram-stainability nor aerobicity was separable into a simple bimodal distribution in the feature space. However, a moderate tendency for anaerobic bacteria was observed; gram-positive anaerobic bacteria and gram-negative anaerobic bacteria were biased in two different directions.
Additional analysis of the misclassified data was conducted in relation to the learned features. We compared correctly classified data and misclassified data in the 3D RI tomograms and corresponding saliency maps highlighting the voxels relevant to the prediction. The tomograms and Grad-CAM++ 40 saliency maps of two data sets from B. subtilis are presented in Figs. 3c, d. While the correctly classified specimen was slightly tilted from the lateral plane and most B. subtilis specimens, the misclassified specimen was directed straight up unlike any of the specimens in the training dataset. The misclassification may have occurred because of peculiarities in the data rather than optimization of the classifier, considering the high-contrast in the saliency maps for both correctly classified and misclassified data.

Characteristics of 3D RI tomograms for identifying bacterial species
We next examined whether a 3D RI tomogram more suitable data for classification compared to different types of data. To further investigate the high performance of our 3D classifier, we used various types of input data to train new classifiers for comparative analysis (Fig. 4a). A total of six types of data were compared: a 2D optical phase delay image, set of 2D optical phase delay images from multiple illumination angles, 3D RI tomogram, and threshold-based binary masks of the three types of images. The three types of images and the corresponding classification approaches are referred to as 2D, multi-view, and 3D throughout this section. Thresholds in the 2D phase and 3D RI were set to 0.05 and 1.3425, respectively, to accommodate a wide variety of phase and RI values in bacteria (Supplementary Fig. 2).
A comparative study revealed that a 3D RI tomogram contained the most suitable data for species classification.
The accuracies were 74.2%, 92.2%, and 94.6% for 2D, multi-view, and 3D tomograms respectively, and 49.6%, 75.9%, and 92.4% for the corresponding binary masks. Classification based on the original data was more accurate than classification based on the binary mask counterpart for all 2D, multi-view, and 3D approaches. The accuracy gap was more significant in the 2D and multi-view approaches than in the 3D approach.
We examined the properties of 3D data which rendered it more suitable for species classification compared to 2D and multi-view data. We focused on the difference between the multi-view and 3D data because multi-view data includes 2D data and contains identical physical information as the 3D data in principle. For the comparison, the 3D/multi-view data and saliency maps of a specimen correctly classified only in the 3D approach were examined (Figs. 4b-e).
The 3D data provided the 3D context to CNN more robustly compared to multi-view data. The 3D context was directly expressed in the 3D data as the axial coordinate, whereas it indwelled the horizontal contortion of the image towards the illumination direction in multi-view data. The saliency maps indicated that the 3D context is more accurately recognized in the 3D data. The 3D saliency map captured the axial location and boundary of the specimen, while the multi-view saliency map did not display a clear sign that the indwelling 3D context was recognized. This agrees with the observation that 3D data is more tolerant of defocusing compared to multi-view data.
Additionally, the 3D data gave a more consistent image contrast compared to multi-view data. Imaging with coherent light sources is often accompanied by artefacts from undesired interferences caused by multiple reflection or scattering by dust particles. Imaging of bacteria which are only hundreds of nanometres thick is particularly vulnerable to these artefacts. During the reconstruction of the 3D data, the signal-to-noise ratio can be improved because the noisy artefacts are angle-dependent, whereas the influence of the specimen remains consistent. Accordingly, the 3D saliency map exhibits a higher focus on the specimen compared to the multi-view saliency map.
Notably, more rapid identification of bacteria is feasible at the cost of accuracy. While the accuracy of the multiview approach is lower, a computational process in the 3D approach is unnecessary. In addition to numerically applying the principle of ODT, 3D RI reconstruction often involves sample-dependent regularization which may be computationally burdensome.

DISCUSSION
We used deep learning and 3D QPI to establish a rapid and accurate framework for identifying bacterial pathogens.
The CNN classifier was successfully optimized to classify the 3D RI tomograms of 19 bacterial species, achieving 94.6%, 97.5%, and 98.2% accuracy in the identification of species, gram-stainability, and aerobicity. We confirmed the representation learning capability of our classifier by exploiting the dimensionality reduction and saliency maps. Finally, we explored the advantages of 3D RI tomograms for classifying bacteria.
For practical diagnostic application, it is necessary to integrate the proposed method with a rapid bacteria isolation technique. Recent advances in microfluidic engineering and antibody engineering have achieved isolation of bacteria within 1 h from concentrations as low as that of pathogenic blood 41 . By employing the isolation techniques, an efficient diagnostic routine can be developed.
The greatest advantage of the proposed method is its speed. One can identify a new bacterial specimen in few minutes once the classifier has been trained. Acquiring a 3D RI tomogram of individual bacteria does not require time-consuming processes such as culturing or staining, and the trained CNN classifier consumes only milliseconds or less to make predictions using input data. Combining the bacteria isolation system, tomographic imaging system, and classifier will result in rapid identification in practice.
Our proposed method has important advantages compared to other methods for identifying bacterial species.
An overall comparison to between differnt techniques is displayed in Table 1. Biochemical methods including bacteriophage susceptibility test 20 , antiserum test 16 , DNA microarray 19,21 , and real-time PCR 17 can be utilized to identify bacterial species. However, these invasive methods also involve hour-long to day-long processes and necessiate the management of specialized biochemical agents, on account of the heavy reliance on biochemical reactions or molecule-specific signals. In contrast, the proposed method is non-invasive and rapid due to the labelfree nature and the single-cell charcaterization capability of ODT. Also, our method is less vulnerable to human factors during both the measurement and analysis compared to other methods. While the other methods vary in protocol among institutes and experimenters, the proposed method exploits consistent measurements in a computational manner. Our method also provides visual interpretations of the prediction, allowing for its incorporation into the current diagnostic workflows.
We particularly underline specific advantages of our method compared to MALDI-TOF MS 22,23 , which is the current gold standard. First, our method does not require time-consuming amplification of the specimen whereas MALDI-TOF MS generally requires a certain amount. This is because 3D QPI is capable of profiling bacteria at the single-cell level. Second, our method can be used in the presence of multiple types of bacteria. MALDI-TOF MS may fail in instances of mixed infection. On the other hand, repetitive investigation using our method can identify multiple types of bacteria, because of the single-cell profiling capability. Finally, our method preserves the specimen due to the non-invasive nature of 3D QPI. While MALDI-TOF MS requires ionization of the specimen, 3D QPI simply measures the light scattered by the specimen, leaving it intact. Therefore, our rapid method can be followed by a larger number of assays, such as MALDI-TOF MS itself, without expending the limited specimen.
We plan to extensively investigate the capability of our method. An essential step is to confirm the performance in clinically obtained bacteria. We achieved success in the demonstration based on laboratory-cultured bacteria of preserved isolates, yet the grounds for clinical application can be presented by verification using clinical material.
Furthermore, the scalability of the method to a wider and more complex database should be studied. In other words, the performance of the method in non-bacterial microbes such as fungi and intra-specific strains are yet to be explored.
CNN can be effectively coupled with 3D QPI, i.e., an appropriate tool involving the use of complicated 3D RI tomograms for identifying or analysing biomedical phenomena. CNN has mainly been utilized for exploring 2D images because of the accessibility of 2D images. However, CNN can extract the spatial context from highdimensional data by using 3D data. The commercialization of 3D QPI techniques such as ODT has allowed the accumulation of 3D RI tomograms related to live biomedical phenomena. However, the current understanding of the relationship between the RI distribution and biomedical phenomena remains limited because of data complexity and the lack of molecular specificity. Moreover, there are some correlations between RI and molecules, although RI does not immediately present the molecular distribution 42 . The complex 3D context of RI can be extracted by CNNs to obtain implicit biochemical information 32 . Cooperative utilization of 3D QPI and CNN can further bridge the gap between the RI distribution and complex biomedical phenomena.

Preparation of bacteria
The bacteria were prepared by culturing frozen glycerol stocks in liquid media. The frozen stock of each species was stored at -80°C and thawed to room temperature (25°C) before use. The thawed stock was stabilized in the culture medium suitable for the species in a shaking incubator at 35°C. The stabilized bacteria were streaked onto agar plates containing a suitable culture medium. The agar plates were incubated at 35°C until single colonies were directly observed. Subcultures of individual colonies were grown in a shaking incubator at 35°C until the concentration of bacteria reached 10 8 -10 9 colony-forming units per millilitre. The subculture was washed once using fresh medium and twice using phosphate-buffered saline. Each washing process was preceded by centrifugation at 8000 ×g for 3 min to isolate the bacteria. The prepared solution was diluted to a concentration suitable for imaging with phosphate-buffered saline.

Optical measurement
3D RI tomograms were acquired using a Mach-Zehnder 3D QPI setup. Specifically, a set of multidirectional 2D QPI measurements was assembled in the spatial frequency domain to a 3D RI tomogram under the principle of ODT. Each 2D QPI measurement was conducted based on off-axis holography using the Mach-Zehnder interferometer illustrated in Supplementary Fig. 3. The direction of the light illuminated onto the specimen was controlled using a digital micro-mirror device 43 . Using a continuous wave laser with a wavelength of 532 nm and two water-immersion objective lenses with a 1.2 numerical aperture, the optical resolution was 110 nm in the horizontal axis and 330 nm in the vertical axis according to the Nyquist theorem. We resampled the 3D RI tomograms with 100-nm wide and 200-nm high voxels. Detailed descriptions of the computational 3D QPI and ODT can be found elsewhere 37 .

Classifier optimization
To fully exploit the volumetric information of 3D RI tomograms, we utilized a neural network with 3D convolutional layers. The architecture design of the CNN classifier is illustrated in Supplementary Fig. 4. The architecture was inspired by the densely connected convolutional network (DenseNet) 38 , which has shown good performances in recognition tasks for images [44][45][46][47] . To prevent the loss of key features, we did not use the maxpooling layer which follows the initial convolution in the original DenseNet. In total, the CNN consisted of 169 convolutional layers wrapped in 30 dense blocks, followed by a global average pooling and fully-connected layer. The classifier in the 2D and multi-view approaches was optimized in a similar manner. In the 2D approach, the architecture was the 2D counterpart of the above-mentioned architecture. In the multi-view approach, predictions of 2D images were ensemble averaged to make the prediction for multi-view data. The batch size was 1024 and 512 in the 2D and multi-view approaches, respectively. The optimizer used in the 3D approach was the same as that used in the 2D and multi-view approaches.    between classification performances of neural network classifiers using a single 2D phase image, multi-angular series of 2D phase images, and three-dimensional (3D) refractive index (RI) tomogram. 2D, multi-view, and 3D refer to the input images respectively. b-c, Representative specimen correctly classified using the 3D RI tomogram but misclassified using the multi-view phase images. b, 3D RI tomogram of the Haemophilus influenzae specimen. c, Four of multi-view phase images of the identical specimen. d-e, Visualization of species-related patterns in bc using Grad-CAM++. d, Grad-CAM++ saliency map corresponding to the 3D RI tomogram. e, Grad-CAM++ saliency map corresponding to the multi-view phase images. Scale bar = 2 m.