Abstract
We demonstrate a statistical modeling technique to recognize T cell responses to different external environmental conditions using membrane distributions of T cell receptors. We transformed fluorescence images of T cell receptors from each T cell into estimated model parameters of a partial differential equation. The model parameters enabled the construction of an accurate classification model using linear discrimination techniques. We further demonstrated that the technique successfully differentiated immobilized T cells on non-activating and activating surfaces. Compared to machine learning techniques, our statistical technique relies upon robust image-derived statistics and achieves effective classification with a limited sample size and a minimal computational footprint. The technique provides an effective strategy to quantitatively characterize the global distribution of membrane receptors and other intracellular proteins under various physiological and pathological conditions.
Introduction
The plasma membrane has a specific protein composition that plays pivotal roles in a wide variety of cellular processes, including receptor-mediated signaling, drug interactions, endocytosis and transport, and cellular communication. Molecular clustering of plasma membrane proteins provides a means to modulate intracellular signal transduction.1–4 Recent studies utilizing advanced microscopy have revealed new insights into clusters of membrane proteins and their distinct roles in signaling.1,5–15 To extract the nanoscopic structural information, various statistical clustering algorithms have been developed to help circumvent the artifacts that arise from single-molecule localization microscopy. Conventional grouping strategies, such as the Ripley’s and density-based spatial clustering techniques, can be biased toward densely labeled regions or the repeat appearance of single molecules across multiple frames.16 Pair correlation analysis overcomes the stochastic variations of the fluorophores.17,18 Although these techniques have advanced the quantitative characterization of the spatial organization of membrane proteins on the nanometer scale, there remains a lack of strategies for characterizing the global protein distribution.
Increasing evidence suggests that the mesoscale organization of intracellular proteins contains “fingerprint information” about the cellular states.19 The spatial organization of membrane proteins may provide a means to infer the initial cellular response to the external environment. In this study, we developed a classification technique by using T cells whose membrane receptors have been known to correlate with the immune response. T cell receptor (TCR) membrane domains9,10,20 and monomers21 on quiescent T cells, as well as the aggregation of TCRs on activated T cells in vitro22–24 and in vivo25 have been observed. Single-molecule tracking experiments have revealed the role of signal dispersion and amplification of TCR at the plasma membrane.26 In addition, multi-dimensional analysis of TCR dynamics using advanced lattice light-sheet microscopy has also enabled the prediction of T-cell signaling states.27
Specifically, we sought to evaluate whether steady-state TCR images from standard fluorescence microscopy can be used to differentiate T cells exposed to different external environments. A popular solution to achieve this capability is through data-driven artificial intelligence (AI) techniques, such as deep learning.28,29 Computer-aided assessments of bioimages have enhanced our understanding of non-visual image differences for diagnostic and prognostic purposes.30 Despite these new promises, AI techniques face formidable challenges. Controversies arise regarding a lack of transparency in the “black box” of AI algorithms.31 Current AI-based image class discrimination techniques rarely result in model parameters whose statistical significance can rigorously support the image model hypothesis.32 AI techniques also demand long processing time and operator attention to achieve quantitative image modeling.33
To address these limitations, we have developed an analytical image analysis technique based on partial differential equation (PDE) image models, linear class discrimination of the estimated model parameters, and logistic regression estimation of individual cell class probabilities. We termed the technique: Statistical Classification Analyses of Membrane Protein Images (SCAMPI). We demonstrate that non-visual cues from diffraction-limited fluorescence images can be harnessed to exact the characteristic information pertaining to the specific surface condition a T cell lymphocyte interacts with. We realized SCAMPI using two discrimination models: Fisher Linear Discrimination (FLD) and Logistic Regression (LR). SCAMPI eliminates the need for computationally expensive algorithms. Moreover, the models generated by SCAMPI carry image-derived information and can be used to investigate a wide range of membrane protein organizations.
Results
Image model construction
Previous studies have validated PDE models of images.34–39 We implemented a general ordinary least squares (OLS) modeling strategy: Image = Model + Residuals, where we identified the best Model by minimizing the variance of Residuals. Specifically, we estimated digital image models as vectorized partial difference equations (PdEs) subject to OLS parameter estimation. The rationale for this strategy is that non-degenerate images have statistically significant, sample-based, two-dimensional autocorrelation functions and as do linear, stationary, partial difference equations.30,36 Fig. 1a illustrates that a fluorescence image can be modeled as a combination of pixel-shifted images in coordinates x, y, and x & y. Similar properties also hold for stationary PdEs. Though such models are mathematically rigorous, they are plagued by having parameters requiring machine learning techniques for estimation.36 We overcame this disadvantage with a simple image matrix-to-vector transformation (in column-major order) that results in an analytic PdE parameter estimation procedure (Fig. 1b). Such a strategy was initially successful in medical imaging, such as a dementia discrimination application using MRI brain scans.36 For modeling a typical fluorescence image, we constructed an image model as a general linear PdE with constant coefficients (Supplementary Note 1). The OLS estimates of the model parameters are obtained and evaluated by the Student-t scores for their significance. If significant, the corresponding model parameters are retained. If not, the spatial lags in pixels are increased to reconstruct an alternative image model (Fig. 1b). Fig. 1c shows the intensity profile of a typical fluorescence image of the TCRs in a T cell (170-by-170 pixels). Fig. 1d demonstrates the corresponding image model constructed from four spatial lags (166-by-166 pixels).
a) Formulating an image spatial lag structure for the image to be modeled. b) A flowchart outlining the procedures of obtaining image model parameters through ordinary least-square estimation. c) A representative intensity profile of a fluorescence image of T cell receptors from a T cell and d), its OLS image model constructed from a model with 3 parameters. See Supplementary Note: 1 for regression statistics.
Linear discrimination using model parameters
Model parameters obtained from the PDE image models are used to achieve class discrimination by the Fisher linear discriminant method.40 FLD projects individual parameter vectors onto a line so as to maximize the separation between projected parameter vectors while minimizing the variation within the projected vectors (Fig. 2a, and Supplementary Note 2). We applied FLD to model parameters from a training image data set by grouping images into two classes: class 0 (activation) and class 1 (non-activation). Test images, not seen by the classification model, were used to evaluate the precision of class discrimination (Fig. 2b).
a) Illustration to maximize the separation between two groups of images while minimizing the variation within each group. b) Construction of the class discrimination model using the training data and evaluation of the classification model using the test data. c) Use of TCR images from non-activating (poly-L-lysine-coated) and activating (OKT3-coated) surfaces to develop and demonstrate SCAMPI. d) Representative fluorescence images of TCRs from Jurkat T cells on the non-activating surface. e) Representative fluorescence images of TCRs on the activating surface.
Development of SCAMPI
To collect membrane protein images for developing SCAMPI, we obtained total-internal-reflectance fluorescence (TIRF) images of TCRs from CD3-EGFP Jurkat T cells on two types of glass surfaces (Fig. 2c). Class 1 represented TIRF images acquired from T cells on a non-activating surface coated with poly-L-lysine (PLL). The electrostatic interactions between positively charged PLL and negatively charged cell membranes facilitate cell attachment to the glass surface for imaging. Class 0 represented TIRF images acquired from T cells on an activating surface coated with the OKT-3 antibody. OKT-3 cross-linked the CD3 molecule of the TCR and induced T cell activation. Images were collected by a 100x/1.49 TIRF objective and a Photometric 95B sCMOS camera with an image pixel size of 110 nm. Fig. 2d demonstrates representative TCR images from the non-activating PLL surface, while Fig. 2e demonstrates representative images from the OKT-3-coated activating surface.
For the FLD-based SCAMPI, there were 97 active cell images and 100 non-active cell images which were randomly divided as TCR images into training data sets of 80 and 78 images, and prediction (test) data sets of 20 and 19, representing class 0 and class 1, respectively. For each image, we obtained model parameters and computed their Student t statistics using the White 41 parameter covariance matrix estimate corrected for heteroskedastic and autocorrelated residuals; see Fig. 1b. The training and testing regression models used spatial lags of six pixels which provided 49 OLS model parameters for each image. This represented going around the loop in Fig. 1b six times (Supplementary Note 1).
With six spatial lags the model OLS regressions have a database of 26,896 (164×164) pixels. Therefore, each of the 49 OLS model parameters has 576 pixels, thus overfitting is not an issue. For the 80-78 cell training FLD, classification accuracy was 88.6%. The 49-element projection vector from that FLD projected the test image parameters as shown in Fig. 3a with a 94.95% classification accuracy. The total runtime for this experiment was 21.89 seconds on a laptop computer.
a) FLD 20 class 0, and 19 class 1 test images classified with 94.95% accuracy using a 80,78 image training set. b) Small sample FLD discrimination of 97.5% accuracy from 20 randomly selected images from each class. c) LR discrimination using the same small sample data as in b. d) Cross-comparison of the FLD and LR discrimination using the small sample data.
Estimating cell class probabilities and linear discriminants
The effectiveness of FLD-based SCAMPI is attributed to the fact that model discrimination parameters depend on both the spatial distribution and fluorescent intensity of individual TCR clusters. The FLD projection of an individual cell represents an optimal weighted average (FLD eigenvector weights) of that cell’s OLS parameters. Nevertheless, the FLD projection does not represent the probability a particular cell belongs to a particular class. Such information becomes useful for evaluating the T cell response when the environment variable changes, such as the ligand concentration and composition. Of note is that TCRs recognize a single agnostic peptide embedded in the major histocompatibility complex and T cells can be activated by a few peptides.42 The mechanism of signal dispersion and amplification remains a focus of current imaging investigations. To develop a discrimination technique by which an individual cell class probability can be estimated, we further developed SCAMPI to include logistic regression (LR).
LR as a discrimination tool in this application uses an image’s OLS parameters to estimate the probability that an image belongs to a specific class.40 LR validation constraints differ from those of FLD because LR is a nonlinear regression. In addition, LR provides class probabilities for individual test subjects, a capability that FLD and machine learning classification techniques lack. Fig. 3b shows 97.5% accurate FLD projections using a 20-20 small sample size, and Fig. 3c indicates LR class probability estimates for the same 20 random images chosen from each class of the 100-97 image dataset (Supplementary Note 3). Of note is that, in the small sample regime, FLD and LR achieved similar classification results. Fig. 3d captures the classification consistency of FLD projections and LR probabilities, thereby independently validating the FLD discrimination results.
Discussion
The successful class discrimination using 20 training images from each class demonstrates that SCAMPI is capable of accurate class discrimination using a small sample size. We attribute this unique capability to two salient factors: First, a vector transformation that provides OLS regressions with a very large number of samples (the three-parameter model of Fig. 3 has 9,520 samples per parameter) resulting in robust image-derived statistics and second, the optimal minimization of inter-system noise by OLS estimation. Unlike machine learning techniques, the PDE image model not only carries information about the number of TCR clusters and their size and shape but also the detailed image spatial structure. The latter contains characteristics of the spatial distribution of TCR clusters. Each training image positively enhances the class discrimination model. In our demonstration, as few as 20 images per class were found to be sufficient in achieving accurate class separation and probabilistic corroboration. Importantly, over-fitting to a specific classification model may occur and degrade class separation by SCAMPI. For example, a test evaluation of a 40-40 image sample yielded a 72.5% FLD accuracy against a 97.5% accuracy for the 20-20 image sample. The statistical methods of fitting limits are not possible in machine learning methods. As such, machine learning techniques require big data to construct an empirical discrimination model, in which image-derived statistics are quickly lost during model optimization.
In SCAMPI, we removed inter-system variations by using the imaging data acquired from the same cell line, by the imaging system, and under the same imaging conditions. Such a requirement is necessary because image model parameters are sensitive to the image format and quality. Unique characteristics related to the optical system, sample preparation, and data acquisition have been normalized within these imaging data. These characteristics include the point spread function of the optical system, higher-order optical aberrations, sample labeling densities, photophysical properties of different fluorescence labels, pixel size, and quantum efficiency of the detector camera, all of which play critical roles in the fluorescence imaging data. These inter-system variations represent a major risk in misclassification and the noise they present can be minimized in variance by OLS estimation.
Through SCAMPI, we also revealed that fluorescence images of TCR contained “signature information” about the T-cell response. The clustering of TCRs is well known to correlate with the early signaling events during T cell activation. 1,3,25,43 SCAMPI show that non-visual cues from images of membrane receptors can be captured by statistical techniques and effectively utilized to characterize the cell response to the external environment. More importantly, such information can be extracted from standard fluorescence images with a relatively small test sample size. Combined with its small computational footprint, SCAMPI may find its way into clinical settings where potential treatment benefit can be evaluated based on the discrimination image model constructed from cells derived from responders and non-responders. SCAMPI may pave the way for improving the treatment response rate targeting membrane receptors, such as in immunotherapy.
SCAMPI can be readily applied to other membrane receptors and fluorescent labeling techniques. Moreover, SCAMPI is amenable to multiplexed imaging data. In this regard, SCAMPI can be developed using high-dimensional statistics. Coupled with the development of automated and highly multiplexed super-resolution imaging techniques,43 SCAMPI has the potential to reveal more complex and global protein interrelationships beyond colocalization and correlation analysis.
In summary, we report a linear discrimination technique SCAMPI to discriminate activated from non-activated T cells based on the spatial organization of T cell receptors. SCAMPI harnesses non-visual cues of fluorescence images and rapidly classifies cellular states with sample-derived statistics. Most importantly, SCAMPI is immune to the drawbacks of AI techniques. It represents a fresh approach to the “big data” challenge and potentiates the fluorescence image-based discovery of structural features related to the cellular states.
Materials and Methods
Cells and reagents
Jurkat E6–1 T cells that express CD3-EGFP were cultured in RPMI 1640 Medium (from Gibco, USA, CAT#: 11875119) supplemented with 10% Fetal Calf Serum (FCS) (from Gibco, USA, CAT#: 14190-149) in a humidified atmosphere at 37°C. Cells were incubated in an imaging buffer consisting of HBSS (from Life Technologies, USA, CAT#: 14175-095) supplemented with 1% FCS before the fixation. Monoclonal antibody against CD3ε (clone: OKT3, CAT#: BE0001-2-25MG) was purchased from Bio X Cell, USA. Fixation buffer consisting of 4% paraformaldehyde (Alfa Aeser, USA, CAT#:43368) and 0.1% glutaraldehyde (Electron Microscopy Sciences, USA, CAT#: 16100) was used to fix cells on the coated surfaces.
Surface Preparation
Eight well chambered cover glasses (Borosilicate sterile No 1.5, CAT# 155409, Lab-Tek) were cleaned with absolute ethanol and dH2O, then incubated overnight at room temperature. Activating surfaces were produced by adding OKT3 antibody (200 μL) at a concentration of 1 μg/ml in PBS (from Gibco, USA) into a well. Poly-L-lysine (PLL) surface were produced by adding PLL (200 μL) at a concentration of 0.01% in H2O (P8920 from Sigma-Aldrich, CAS#: 25988-63-0) into another well. Eight-well chamber slides containing OKT3 and PLL were incubated overnight at 37°C.
Imaging TCR clusters
Supernatants of the wells containing OKT3 and PLL were decanted and cells (100k) were added to each well. It was incubated at 37°C for 8 minutes. After the incubation, cells were observed under a conventional microscope to confirm if they were attached to the surface or not. Supernatants of OKT3 and PLL coated wells were decanted and a fixation buffer (250 μL) was added to the wells. It was incubated for 15 minutes at room temperature. After 15 minutes, samples were rinsed thoroughly with PBS.
Total internal reflection fluorescence (TIRF) microscopy
TIRF microscopy experiments were performed on a Nikon Eclipse Ti2 inverted microscope equipped with a 100×/1.49 oil-immersion objective. For TIRF imaging, 488 nm laser was used. Emission light was filtered using appropriate filter sets and recorded on a Prime 95B sCMOS camera with a pixel size of 110 nm in the image plane. Images of TCR clusters were acquired with 2.15 mW (15%, 488 nm) laser power at a 400 ms exposure time.
SCAMPI Standard Model Statistics
FLD-based SCAMPI is an effective discriminator because model discrimination parameters depend on both spatial distribution and fluorescent intensity of individual TCR clusters. But T cell response to environmental variations, such as the ligand concentration and composition, must be of interest in discrimination and beyond, if one attempts to regulate this response. In particular, T cells can recognize a single agnostic peptide embedded in a major histocompatibility complex and also be activated by a small number of peptides.42 Such multidimensional excitation sensitivity requires a standardized T cell model whose parameters capture cell response nuances to multiple excitations, are comparable across experiments, discriminate cell classes, and estimate individual cell probabilities of class membership.
The standardization proposed in the following model is based on the diffusive and advective cell structures found in the cell literature. For this purpose, we propose the PDE model in (1a) which is a temporal equilibrium form of a nonhomogeneous, hyperbolic PDE (Supplementary Note 1).). Its digital, estimable form in (2a) clearly illustrates the model dependence on protein advection, parameters β0,1 and β1,0, and diffusion, parameter β1,1
To meet the demands placed on (2a) as a T cell protein membrane model it is necessary to restrict the number of images for estimating the βk,l so that it remains an accurate discriminator and, as discussed below, simultaneously an accurate predictor of individual class probabilities.
Further, for comparative testing of large image sets for response homogeneity, the image support for (2a) should be a small fraction of the classes being tested.
A random selection of 20 active cell and 20 non-active cell images were used to estimate the βk,l in (2a) as bk,l. All 20 element b parameter vectors passed a Kolmogorov-Smirnov test for normality at 0.05 or better (Table 1).
Mean (n = 20) model parameters and mean Student-t tests of parameters for image models constructed with 3 parameters (one spatial lag).
Fig. S1 is a typical normal distribution check for these estimates. The Student-t statistics in Table 1 were computed using the White asymptotic parameter covariance matrix since OLS image residuals are frequently autocorrelated and heteroskedastic. Each image regression had 28561(169-by-169) pixels so the White asymptotic matrix almost surely applied.
Author contributions
YSH and WDO conceived of the study. RM conducted fluorescence imaging. WDO performed image model construction and linear discrimination analyses. All authors contributed to the preparation of the manuscript.
Competing interests
The authors declare no competing interests.
Data availability
Image data and MATLAB codes are available upon request to the corresponding authors.
Supplementary Information
Note 1: Image modeling method, an example
We hypothesize the gray-scale pixel intensity value, v(x, y), of an image in Cartesian coordinates satisfies the PDE:
In (1a), u(x, y) is a zero mean random noise variable to be minimized in variance to estimate the α parameters. This equation has a long history as a model for a wide range of images 31,33 but also as an advective and diffusion model of particles coagulating over space and time. 44 To estimate the α parameters, we approximate (1a) with a partial difference equation (PdE) on a grid indexed by x = iΔx, y = jΔy and approximate derivatives by backward differences. In our imaging experiments, Δx=Δy=110 nm. For discrete images of unit width pixels (1a) becomes the matrix equation
in which ei,j is the spatially discrete version of u(x, y). The vector transform of a matrix sum is the sum of vector transforms. With q= vec(vi,j), z1= vec(vi,j−1), z2= vec(vi−1,j), z3= vec(v1−1,j−1), and vec(ei,j)= ε, (2a) becomes
in which the q vector represents the image to be modeled, Z is a design matrix of spatially lagged versions of q, βT = [β0,1 β1,0 β1,1], and ε is a zero mean residual error vector whose variance is minimized by the OLS estimate b of β,
Zb is the OLS estimate of the image and
is the estimated image model error,
.
For the model in (2a), image pixels must be sacrificed to make q and Z compatible for addition. This data loss is not usually significant; for example, the images in Fig. 2 have the samples per parameter estimated in the OLS regression decrease from 9,633 to 9,408.
The Student t statistics are computed using the White parameter covariance matrix estimate corrected for heteroskedastic and autocorrelated residuals.41 We found it is common for OLS image models defined by vec transformations to exhibit heteroskedastic and autocorrelated residuals, that is, the random error terms are, in all probability, from different distributions and correlated. The White parameter covariance estimates are asymptotic results. With 28,224 degrees of freedom in this regression such asymptotic conditions surely prevail. The extraordinarily large Student t values reflect the large degrees of freedom per estimated parameter; typical of OLS image models.
The general linear, constant coefficient PdE of 2 independent variables has a discrete PdE representation:
A regression of the vector transform of (5) can be shown to require (p+1)(q+1) OLS parameters. This general model of order r = max (p, q) of an m × n image requires (r + 1)(m + n − r − 1)) pixels to be sacrificed as in (2a) above. Note that (2a) is (5a) for p = q = 1.
Note 2: Fisher Linear Discriminant
Each image has an OLS-estimated vector of parameters. Recall that the Fisher Linear Discriminator (FLD) will project individual vectors onto a line so that the variation between the projected samples is maximized relative to the variation within the projected samples. To see how this is executed with OLS image parameters, let B1 and B2 be the class matrices of parameter vectors of m1 and m2 respective images. Each vector is of parameter size n so the class matrices are m1 by n and m2 by n. Let G1 and G2 be the estimated covariance matrices of the B matrices and Gp the estimated covariance matrix of Bp =[B1 B2]T. Then the eigenvector vc satisfying:
for the unique eigenvalue λ ≠ 0, is the optimal projection vector for discriminating active from non-activated cells. (6a) is solved for vc and used to project the test class parameter vectors. The regression models were pde(6) which amounted to 49 parameters per image so vc is a 49 element vector for projecting the test images.
The test result of Fig. 3a required 158 training set OLS models and 39 test set OLS models, each with 26,896 sample points, plus the solution of (6a) for vc. This required 21.89 seconds of CPU time.
Discrimination efficacy increases with the number of parameters per image. However, constraints become tight for experiments with a relatively small number of class members. Equation (6a) only has a non-trivial solution for vc if the number of parameters is less than or equal to the total number of class members less two. 44 A second constraint is: As p + q in (5a) increases G1, G2, and Gp fail to be positive definite, implying the eigenvector solution of (6a) is no longer valid. 44 For the training classes of 158 total cell image samples, 156 or fewer parameters per image is a generous constraint. But the G matrices in (6a) are not positive definite for p + q > 7. If there are 20 images per class, then 38 parameters per image becomes a tight constraint, in addition to that of G matrix positive definiteness. For such cases, logistic regression becomes a plausible alternative classification approach (SI Note 3).
Note 3: Class probability discrimination using Logistic Regression (LR)
Logistic regression as a discrimination tool estimates the probability a given image belongs to a specific class and does so without FLD-type data constraints.40 Let i = 1,2,…40 be an image index for 20 inactive cells and 20 active cells. The Bernoulli random variable Yi is assumed to take the value yi =1 if image i is in the active class and 0 if in the non-active class. pi is the probability image i is in the active cell class conditioned on explanatory (independent) variables hypothesized to control class membership. Let the design matrix of explanatory variables for the LR be X composed of rows xi Then the hypothesized LR model uses data on yi and X to estimate a parameter vector φ in the logistic regression:
In (7a) we use for the xi rows, one for each cell image, the estimated row vectors of the B parameter matrices from OLS regressions of the 40 cell images. The class identities are known for all images from the experiment producing the cell images: For non-active cells 1 through 20, the yi data is 0 and active cells 21 through 40 have yi = 1. φ is estimated by maximizing the likelihood function of the independent Bernoulli distribution for 40 samples; this is a nonlinear optimization.40
A goodness of fit measure, analogous to R2 for OLS, has been, and is, controversial regarding the number of samples per parameter estimated in the φ vector. 45 Further, unlike OLS where there are numerous testable statistics available, in LR statistical significance is still mostly reliant on Monte Carlo simulation so there are only a few robust tests to decide which variables to include in a regression. TheTjur46 Coefficient of Discrimination for the images used in Fig. 3c produced the statistics given in Table S2.
The parameter φ0 is the mandatory constant term required of LR. The columns of the 40 by 3, X, matrix passed the Kolmogorov-Smirnov test for normality at the .05 level for each class, which is to be expected of parameters estimated by an OLS regression of 28224 degrees of freedom. This is a distinct advantage of LR based SCAMPI since it is known that logistic regressions with normal independent variables yield robust Wald statistics. 47 With normally distributed independent variables, see Figure S1, the Wald statistics are distributed Student-t with 17 degrees of freedom so p < 0.0398 for all of them. The coefficient of discrimination, COD, for a perfect fit is 1.00 and ideal Deviance is 0.0.
Tables S1 and S2 present highly significant OLS image parameters yielding highly significant logistic regression probability estimates for the same images. The consistency in the three stages of optimization, OLS, FLD, LR, to achieve these results is captured in Fig. 3d.
Typical normality check of 20 bk,l parameters against a N(0,1) cumulative distribution function.
Acknowledgements
The authors thank the support from the Department of Chemistry at the University of Illinois at Chicago