Abstract
Holographic cytometry is introduced as an ultra-high throughput implementation of quantitative phase image based on off-axis interferometry of cells flowing through parallel microfluidic channels. Here, it is applied for characterizing morphological changes of red blood cells during storage under regular blood bank condition. The approach allows high quality phase imaging of a large number of cells greatly extending our ability to study cellular phenotypes using individual cell images. Holographic cytology measurements show multiple physical traits of the cells, including optical volume and area, which are observed to consistently change over the storage time. In addition, the large volume of cell imaging data can serve as training data for machine learning algorithms. For the study here, logistic regression is used to classify the cells according to the storage time points. The results of the classifiers demonstrate the potential of holographic cytometry as a diagnostic tool.
Introduction
Flow cytometry is a powerful tool that measures multiple physical parameters of complex populations of flowing cells over a short period of time [1]. Since its development as a primarily cell-size measuring instrument, flow cytometry has evolved into a sophisticated modality that can simultaneously detect up to 14 optical and fluorescence parameters [2]. As a diagnostic tool, flow cytometry can be used to obtain information about the biochemical, biophysical, and molecular aspects of broad populations at the single cell level. Structural and morphological characteristics of cells can be derived from scattered light measurements while fluorescence emission is used to provide molecular information by estimating the number of fluorescent probes bound to various cellular components [3].
As an extension, imaging flow cytometry has become a valuable tool in the past decade that enables greater analysis of cellular morphology to provide additional information to biological studies and clinical diagnosis [4–6]. In-depth imagery of individual cells can be used to verify the parameters measured from conventional flow cytometry by obtaining information such as the detailed shape of the cells and the location of labeled biomolecules within them [7– 9]. Also, false positive occurrences can be reduced by analyzing cell images to eliminate non-cell objects such as debris and clusters of cells [10–12]. Therefore, users can be more definitive about the outcome of flow cytometry analysis by having access to sample attributes within the cell images.
Another tool that has been recently developed for cell imaging is quantitative phase microscopy (QPM) which can characterize biological cells by their morphological and spectral features with nanometer levels of precision [13–16]. Currently, the number of cells analyzed by QPM is greatly limited by the need for manual manipulation of the sample stage to find a good field of view. To translate the technique to be a diagnostically significant screening tool, it is necessary to increase the throughput of the system to reach levels similar to those of imaging flow cytometry which ranges from 1,000 to 100,000 cells/second [4]. Therefore, further development of QPM as a diagnostic approach requires the throughput of the system to be significantly increased.
To demonstrate the potential utility of a high throughput QPM method such as holographic cytology, the technique is used to assess the morphological changes of red blood cells (RBCs) that occur during storage, often termed storage lesion. The number of red blood cells that can be imaged with this system is hundreds of times greater than previous studies which have examined RBCs with QPM [13]. Characteristics of storage lesion include morphological, rheological and biochemical changes to RBCs which reduce their lifetime in circulation [17]. The change in RBCs during storage is of interest since there is an increase in patient risks with transfusion of older blood and thus significant efforts have been made to improve the viability of stored RBCs [18]. In addition, the stored RBCs are used to boost the hemoglobin and increase the athletic performance in blood doping. Currently, however, there is no way to effectively measure the viability of RBCs at the cellular level with high enough throughput to be useful for evaluating units for transfusion. Instead, the clinical standard is to depend on the observed percentage of hemolysis as an assay, neglecting the significant changes in RBC morphology and mechanical properties which can reflect their reduced viability.
In this article, holographic cytometry is presented as a quantitative phase imaging system that can acquire a large number of images of cells in flow by incorporating customized microfluidic chips and stroboscopic illumination. The approach is used to acquire quantitative phase images of a broad population of individual RBCs at different storage time points. To classify these cells, a set of morphological parameters are extracted from each cell image to enable characterization of millions of cells at the single cell level. The extracted physical parameters show consistent changes over time and suggest new avenues for improved understanding of the effects of storing RBCs. Further, the extracted parameters are used to train a supervised learning algorithm, based on logistic regression, to classify the cells according to their storage time points.
Materials and Methods
Blood Collection and Preparation
Whole blood from five different healthy donors (Table 1) are collected into CPD- OPTISOL [AS-5] collection sets.
Each unit of whole blood was stored at 1-6°C for up to 6 weeks for the experiment. The stored blood is removed from the units at specified time points using a sterile-docking system with a fitted valve. The removed blood was centrifuged at 1000 rpm for 5 minutes to isolate the RBC pellets which were resuspended as a 0.8% hematocrit solution in 5mL of high refractive index medium (RI = 1.372 at 23°C). The mixture is then loaded onto a syringe pump system flowing at 3µL/min for the experiment.
Development of Microfluidic Channels
The following experiments were performed using PDMS channels. The mask design is shown in Figure 1. Since the width of the channels, 40μm, is much larger than the average diameter of the RBCs, the passages are free from blockages that could potentially disrupt the uniform flow across the channels. In addition, the height of the SU-8 master mold, 5.39 + 0.18 μm, is comparable to the thickness of the RBCs.
Holographic Cytometry
The RBCs flowing through the channels were imaged using the holographic cytometry shown in Figure 2.
The illumination beam from a laser source (λ0 = 640nm, Δλ = 0.7nm) is modulated at 300Hz using an acousto-optic modulator (AOM). The AOM is triggered to synchronize with the frame rate of the camera using an Arduino based microcontroller. The pulse width of the modulated beam is 350μs and the spatial noise of the system is observed to stabilize at the 1nm rms level, with an intensity that is lower than the saturation level of the detector. The pulsed illumination beam provides a stroboscopic effect such that the motion of the cells, which are flowing at maximum velocity of 2.65mm/s, is effectively frozen during the exposure. Over the duration of one individual pulse of light, the RBCs travelling at this maximum velocity move only 0.93μm, which is less than the spatial resolution of the system (0.98μm), and therefore motion artifacts are minimized.
The output beam from the AOM is coupled into a 1×2 fiber coupler which splits the beam into the sample and the reference arms for the off-axis Mach-Zehnder interferometer. The collimated beam from the sample arm passes through the sample and the image is magnified by 33x using an objective lens (20x, 0.4NA). The magnified beam is combined with the collimated reference beam at an angle using a beam splitter to create an off-axis interferogram which is captured by the camera (Dalsa, 4096×96px, 300fps). For each sample, 99 sets of 10,000 frames (∼33 seconds each set) are captured at each storage time point.
The interferograms are post-processed frame-by-frame to obtain optical phase delays, Δϕ, of the wavefronts propagating through the sample which depends upon refractive index as: where n is the refractive index, h is the height of the sample, and λ refers to the wavelength. An example of a phase image sequence of stored RBCs flowing through the channels is shown in Figure 3.
Briefly, the complex field information describing the flowing RBC was retrieved from the interferograms [14]. Each complex field image is digitally refocused by propagating the field to the plane of best focus, as determined by the minimum variance of amplitude of the cells in the first 3 frames [19]. The cells in the subsequent frames of the flow sequences are digitally refocused using the same distance determined in the first 3 frames since the axial positions are restricted during flow by the height of the customized channel which is comparable to the cell thicknesses [20]. After background subtraction, any remaining noise is removed using a 3rd-order polynomial fit to the field of view excluding the cells. Then, each RBC is segmented from the image using an area and thickness threshold.
Multiple images of the same cell are identified using a modified tracking code from the Computer Vision Toolbox in Matlab [21]. The tracking code first detects moving objects from a given FOV. Then, it predicts the subsequent location of the object based on the previously observed motion using a Kalman filter. The predicted locations across the frames are used to form motion tracks which identify the moving copies of each single cell object. This ensures that identified copies of multiple images of the same cell do not account for multiple counts in the total number of cells imaged by the system.
Since the RBCs are isolated from the background based only on two morphological parameters, another quality check is performed to ensure that non-RBC objects are not included in the analysis as shown in Figure 4 below.
As can be seen in Figure 4A, there are groups of segmented objects that do not exhibit morphological features that correspond to the physical properties of RBCs based on size and dynamic structure. Therefore, objects with area below the threshold of 21μm2 as well as those with standard deviation of phase below 0.09 are categorized as non-RBC objects. Typical images of these rejected cells are shown in Figure 4C and D. These excluded objects may include platelets or cell fragments which are omitted from further analysis. In addition, some of the segmented objects are multiple cells clumped together during flow as shown in Figure 4B. The clumped cells that are selected based on the circularity (circularity < 0.85 or >1.23) and optical volume (OV > 5fL) are also excluded from further analysis (typical images shown in Figure 4E).
The set of morphological parameters used in our previous study [13] and 2 additional features, solidity and circularity, are extracted for each RBC using the segmented images. Solidity is the ratio between the area of the cell and the convex hull, with area defined by the bounding region that connects the outer points of the cell [22]. Circularity is the ratio between the perimeter and the area of the cell that should resemble a circle for values close to 1.
Results
Total Number of Cells
The total number of unique, single RBCs imaged with the system for all samples on day 1, 15, and 29, after excluding non-RBC objects, are 9,437,349 cells as shown in Table 2 below.
The total imaging time for each sample on a given day is ∼3300s, comprising 99 sets, 10,000 frames per set, at a rate of 300 fps. As can be seen in the table above, the throughput of the system ranges from 36 cells/s for Sample D at day 15 to 420 cells/s for Sample B at day 15 The number of RBCs imaged with the system for each storage day and sample is much greater, by a factor ∼300x or higher, than the total number of RBCs imaged in our previous studies of RBCs using translatable sample stages [13]. The capability to image such a large number of cells will provide a more accurate representation of the total population within the storage units than possible with other methods.
Characterization of Morphological Changes Over Time
Out of the 25 morphological parameters examined here, optical volume and area show the most consistent changes over storage time across most of the samples. The average optical volume over the storage time for all samples are shown as line plots in Figure 5A.
As can be seen in the line plots, the optical volume consistently decreases over the storage time for all samples except Sample A, which declined at week 3, but paradoxically rises at the Week 5 time point. The change in the parameter over the storage period is also illustrated using the normalized histograms in Figure 5B where the optical volume of the cells decreases over the storage period for Samples B-E.
Similarly, the area of the cells follows the trend as shown by the mean value of this parameter with line plots in Figure 6A.
Generally, the area of the cells also decreases over time except for Sample A, at the final time point, Week 5. The change in this parameter over the storage period is also shown using normalized histograms in Figure 6B. It should be noted that the decrease in cell area for Samples B-E may occur at different time points. For Samples B, C, and E, there is a gradual decrease in the parameters over the storage period. For sample D, both parameters are similar at weeks 1 and 3 but then exhibit an abrupt change in week 5. Previous studies have shown that morphological change is not uniformly associated with age [23] and therefore, the different time points where the changes in physical parameters appears could be attributed to sample variations. There is a known donor variability that depends on factors such as age, sex, and environmental attributes [24,25] that is also well represented by the difference in the time dependent trends between the donors for both parameters as shown in Figures 5 and 6.
Classification of RBCs using Logistic Regression
The extracted morphological parameters were used to train logistic regression to classify the cells according to their storage time. In order to efficiently train the classifiers, their performance is evaluated as a function of number of cells imaged using a binary classification between cells in week 1 and 5. The algorithms are built with randomly selected training data and then tested on the remaining dataset and repeated 10 times. The mean and the standard deviation of the classification accuracy for the different training data sizes are represented as error bar plots, shown below in Figure 7.
For all the samples, the accuracy of the algorithms increases with the number of the training data until the gain in the performance becomes insignificant. The improvement in the performance of the algorithms is comparable to the standard deviation of the accuracies across the trials when the number of training data images used in the algorithm is 4096. Therefore, the classifiers are trained with 5000 randomly selected cell images from each class while the performance is evaluated using the remaining dataset. The process is repeated 10 times to get the mean and the standard deviations of the classification accuracies.
Self-Trained Binary Classification: Week 1 vs Week 5
The classification performances for the cells in week 1 vs week 5 using the algorithms trained on the same population are shown in Table 3. When the algorithm is trained with cells from Sample A, the classifier shows the highest performance with 85.6% accuracy where 90.8% and 81.5% of the cells from Sample A are correctly identified as cells from week 1 and 5, respectively.
All Sample Binary Classification: Week 1 vs Week 5
As a further examination, when the algorithm is trained with cells randomly selected from all the samples, its performance across the different donors is shown in Table 4.
The classifiers trained with the cells from multiple samples are able to distinguish the storage time point of the cells from the different donors with 69.9% accuracy where 76.1% and 65.1% of the cells are classified correctly as cells from week 1 and 5 respectively.
Discussion
Throughout the study, holographic cytometry is used to image a large number of RBCs from stored blood units at the individual cell level to identify morphological changes over the storage period. By imaging cells flowing through the channels of the microfluidic chips, the throughput of holographic cytometry is no longer limited by the manual translation of sample stages. The use of multiple parallel channels further increases throughput while the use of stroboscopic illumination allows for a high flow rate without image streaking. At the maximum throughput of the current setup, it takes less than one second to image hundreds of cells, which is the same number of cells from our previous experiments [13]. Therefore, holographic cytometry can efficiently expand the utility of QPM by imaging a significantly greater number of cells relative to traditional QPM imaging systems.
One of the advantages of imaging a large number of cells is that the sensitivity of the system can be characterized at a much higher level of precision, enabling evaluation of rare conditions such as blood infections with low parasitemia percentages [26]. The diagnostic capability of a system will be limited by the lowest number of rare target cells that can be detected within a population. Therefore, by imaging a greater number of cells, the clinical utility of the system as a screening tool can be defined based on classification performance without the lower bound restriction arising from the total number of acquired cell images.
Out of the 25 morphological parameters extracted from the images, OV and area show a consistent decrease over the storage time for all the samples except Sample A, at week 5 as shown by the Figures 5 and 6. A possible explanation for the rise in the number of cells with decreased area over the storage period could be due to the previously reported transformation of discocytes to morphologically altered RBCs with smaller projected surface areas such as echinocytes, spheroechinocytes, and spherocytes [27–29]. One of these studies has shown that morphologically altered RBCs accounted for 4.9% of the blood population on day 3 which increased to 23.6% by day 42 [28]. Therefore, the fairly consistent decrease in area over the storage time observed here can be a result of the echino-spherocytic shift of RBCs during storage. Previous studies have also shown a decrease in MCHC (mean corpuscular hemoglobin concentration) for stored RBCs. In the QPM measurements, this corresponds to a net decrease in refractive index of the RBCs relative to the high RI medium, which would result in a decreased OV [29,30]. It should also be noted that the physical parameters of the cells from Sample A do not continue to decrease at the week 5 storage time, unlike the cells from Sample B-E that show steady changes over the storage time. This exception to the trend may have arisen from variations in sample handling or differences in the characteristics between the donors. In order to fully understand this, RBCs from a greater number of different donors must be examined.
Logistic regression, trained by the large data set of extracted parameters, is used to classify the cells according to the storage time points. Learning curves, shown in Figure 7, compare the classification accuracies relative to the training data set size to determine the optimal number of cell images to use. For all samples, the increase realized in classification accuracies is comparable to the standard deviation seen for the variation in performance across trials when there are ∼4096 cell images used as training data. Therefore, a threshold of 5000 randomly selected cells from each sample are viewed as sufficient to build the classification algorithms, and the remaining cell imaging data are used as the test set.
Another advantage that can be realized by exploiting the large number of cell images that are acquired with holographic cytology can be seen by noting the number of images required to construct the training data. When logistic regression is trained with a low number of cells, the trained algorithms cannot distinguish images in the test data at optimal performance and therefore it can be assumed that the variability within the total population is not completely captured by a smaller training data set. The number of data points required to capture the characteristics of the total population will vary depending on the type of the analyzed sample and the selected diagnostic criteria. For classification of QPM images of cells in stored blood units based on logistic regression, the analysis here shows that the number of training data should be greater than 5000 cells which would have been very difficult to accomplish using QPM with conventional scanning approaches such as using manual or automated translation stages.
The performances of algorithms for binary classification between week 1 and 5 are shown in Table 3. These algorithms were trained and tested by analyzing only cells from within the same sample. For the algorithm trained and tested with only the cells from Sample A, the highest accuracy in classifying the cells according to storage time was achieved, at 85.6% accuracy. The lowest performance was for the algorithm which was trained only on images of cells from Sample E with 78.2% accuracy. Given that Sample A exhibits the only outlier in the trends of OV and cell area, it is important to note that 23 other morphological parameters, extracted from the quantitative phase images are also used to train the algorithms allowing high classification accuracies. Thus, even though OV and cell area are presented in detail here, all of the remaining morphological parameters show significantly significant differences over the storage time. This illustrates the wealth of information that can be obtained from QPM images.
For algorithms that were trained using randomly selected images of cells across all the samples to build a binary classifier (as shown in Table 4), the overall classification accuracy decreases to 69.9%, a decrease that should be expected. The decrease in the performance of the algorithm relative to those trained only within the same population can be attributed to donor variability [24,25]. As can be seen in Figures 5 and 6, the RBCs from different donors have varying morphological parameter distributions as well as distinctive changes over time that describe the variation in storage lesion. While some general trends in storage lesion can be expected to hold for all potential donors, it stands to reason that it may manifest with small variations depending on the source of the blood unit as well as its handling. Therefore, it is reasonable to expect that the performance of the universal classifier would decrease relative to those trained on individual samples. In order to fully characterize the range of morphological changes that can appear due to storage lesion, broader studies with RBCs from a greater number of different donors should be undertaken.
One possible source of variation in the observed trends across samples is the fact that each unit of RBCs is comprised of a continuum of cells of various ages from those at day 1 in the circulation to those that are at the end of the circulatory life [27]. Hence, the age distribution of cells imaged at one specific storage time point will inevitably have an overlap with the other time points examined here and therefore some cross-classification between storage time points by the algorithms is unavoidable. In the future, holographic cytometry could be used to image RBCs that have been fractionated by age to acquire information from cells with narrower age distribution possibly increasing the overall performance of the algorithms and improving our understanding of storage lesion. In addition, the results of holographic cytometry could be combined with other approaches to enhance the ability to detect blood doping. For example, the discovery of the RNA species in RBCs [31] have enabled the identification of the transcriptional changes during RBC storage [32] that may be used together to enhance our ability to identify stored RBCs.
Conclusion
In this study, we have employed holographic cytometry to image large numbers of cells at high throughput. For application to storage lesion, the approach provides data that better represents the variation within blood units than randomly sampling a handful of cells alone. By characterizing a greater number of cells at the individual level than previous studies, holographic cytometry is able to identify consistent morphological changes in RBCs over the storage period with trends that agree with previous literature findings. In the future, by imaging samples from more subjects with diverse demographics, donor variability can be evaluated for establishing a cell-based predictor of transfusion yield. The potential of holographic cytometry as a screening tool is demonstrated by the classification performance of the logistic regression algorithms trained with morphological traits extracted from individual cell images, rather than from whole cell images. This distillation step enables examination of broader population than would be enabled by direct evaluation of the cell images, where the volume of data may pose an obstacle for efficient training and application of algorithms. The classification algorithms show promising accuracies that could be improved by narrowing the distribution of the training data using demographic information such as age, gender and health history, among others.
Acknowledgements
Grant support from NIH (No. 1R21-ES029791), Partner of Clean Competition (PCC), World Anti-doping Agency (WADA) and financial funding from the Duke University are gratefully acknowledged. We also thank the members of the BIOS lab for helpful discussions.