Abstract
We present a conditional generative model for learning variation in cell and nuclear morphology and predicting the location of subcellular structures from 3D microscopy images. The model generalizes well to a wide array of structures and allows for a probabilistic interpretation of cell and nuclear morphology and structure localization from fluorescence images. We demonstrate the effectiveness of the approach by producing and evaluating photo-realistic 3D cell images using the generative model, and show that the conditional nature of the model provides the ability to predict the localization of unobserved structures, given cell and nuclear morphology. We additionally explore the model’s utility in a number of applications, including cellular integration from multiple experiments and exploration of variation in structure localization. Finally, we discuss the model in the context of foundational and contemporary work and suggest forthcoming extensions.
1. Introduction
A central conjecture throughout cell biology is that structure determines function. Thus motivated, location proteomics (Murphy, 2005) aims to determine cell state – i.e. subcellular organization – by elucidating the localization of all structures and how they change through the cell cycle, and in response to perturbations in environment or mutation, for example. However, determining global cellular organization is challenging, not in small part by the multitude of different molecules, complexes and organelles that comprise living cells and determine their behaviors (Kim et al., 2014). While advances in microscopy, particularly live cell fluorescence imaging, have permitted enormous insight and rich datasets with which to explore subcellular organization, the experimental state-of-the-art for live cell imaging is currently limited to the simultaneous visualization of only a limited number of tagged (2-6 tagged) molecules. Statistical modeling approaches can address this limitation by integrating subcellular structure data from diverse imaging experiments.
Image feature-based methods have previously been employed to describe and model cell organization (Boland & Murphy, 2001; Carpenter et al., 2006; Rajaram et al., 2012). While useful for discriminative tasks, these approaches do not explicitly model the relationships among subcellular components, limiting their use to integration of all of these structures, i.e., creating an integrated cell model.
Generative models are useful in this context.They capture variation in a population and encode it as a probability distribution, accounting for the relationships among structures. Fundamental work has previously demonstrated the utility of expressing subcellular structure patterns as a generative model, which can then be used as a building block for models of cell behavior, i.e. (Murphy, 2005; Donovan et al., 2016).
Ongoing efforts to construct generative models of cell organization are primarily associated with the CellOrganizer project (Zhao & Murphy, 2007; Peng & Murphy, 2011). CellOrganizer implements a “cytometric” approach to modeling that considers the number of objects, lengths, sizes, etc. from segmented images and/or inverse procedural modeling, which can be particularly useful for both analyzing image content and approaching integrated cell organization. These methods support parametric modeling of many subcellular structure types and, as such, generalize well when low amounts of appropriate imaging data are available. However, these models may depend on preprocessing methods, such as segmentation, or other object identification tasks for which a ground truth is not available. Additionally, there may exist subcellular structures for which a parametric model does not exist or may not be appropriate e.g., structures that vary widely in localization (diffuse proteins), or that reorganize dramatically during e.g. mitosis or during a stimulated state (such as microtubules).
Thus, the presence of key structures for which current methods are not well suited motivates the need for a new approach that generalizes well to a wide range of structure localization.
Recent advances in generative adversarial networks (GANs) (Goodfellow et al., 2014) present one possible resolution to this dilemma. GANs have the ability to learn distributions over images, generate photo-realistic exemplars, and learn sophisticated conditional relationships; see e.g. Generative Adversarial Networks (Goodfellow et al., 2014), Varational Autoencoders/GAN (Larsen et al., 2015), Adversarial Autoencoders (Makhzani et al., 2015).
Leveraging these advances, we present a non-parametric model of 3D cell shape and nuclear shape and location, and relate it to the variation of other subcellular components. The model is trained on data sets of approximately 160–4400 fluorescence microscopy images per labeled structure; it accounts for the spatial relationships among these components, their fluorescent intensities, and generalizes well to a variety of localization patterns. Using these relationships, the model allows us to predict the outcome of theoretical experiments, as well as encode complex image distributions into a low dimensional probabilistic representation. This latent space serves as a compact coordinate system to explore variation.
Here, we present the integrated cell model, a discussion of the training and conditional modeling, and initial results demonstrating its utility. We then briefly discuss the results in context, current limitations of the work and future extensions.
2. Model Description
Our generative model serves several distinct but complementary purposes. At its core, it is a probabilistic model of cell and nuclear shape (specifically, of cell shape and nuclear shape and location) conjoined with a probability distribution for the localization of a given fluorescent protein (for these experiments, proteins were chosen to outline major cellular structures) conditional on cell and to classify localization patterns from images where the protein is unknown, and to predict the localization of unobserved structures de novo.
The main components of the model (Figure 1) are two modified autoencoders; one which encodes the variation in cell and nuclear shape (reference structure model), and another which learns the relationship between subcellular structures dependent on this encoding (conditional model). Each autoencoder is regularized by the use of adversarial networks on the latent space (to enforce the distribution of latent space representations matches a known distribution), and on the output (to enforce that de novo generated images match the data distribution).
Notation
The images input and output by the model are multi-channel (Figure 2). Each image x consists of both reference channels r and a structure channel s. Here, the cell and nuclear channels together serve as reference channels, and the structure channel varies, taking on one of the following structure types: α-actinin (actin bundles), α-tubulin (microtubules), β-actin (actin filaments), desmoplakin (desmo-somes), fibrillarin (nucleolus), sialyltransferase (golgi), lamin B1 (nuclear membrane), myosin IIB (actomyosin bundles), Sec61β(endoplasmic reticulum), TOM20 (mitochondria), and ZO1 (tight junctions). We denote which content is being used by superscripts; xr,s indicates all channels are being used, whereas xs indicates only the structure channel is being used, and xr only the reference channels. We use y to denotes an index-valued categorical variable indicating which structure type is labeled in xs. For example, y = 1 might correspond to the α-actinin channel being active, y = 2 to the α-tubulin channel, etc. While y is a scalar integer, we also use y, a one-hot vector representation of y, with a one in the yth element of y and zeros elsewhere.
2.1 Model of cell and nuclear variation
We model cell and nuclear shape using an autoencoder to construct a latent-space representation of these reference channels. The model (Figure 1, upper half) maps images of reference channels to a multivariate normal distribution of moderate dimension – here we use a 128 dimensional distribution. The choice of a normal distribution as the prior for the latent space is, in many respects, one of convenience, and of no large consequence to model behavior. The nonlinear mappings learned by the encoder and decoder are coupled to both the shape and dimensionality of the latent space distribution; the mapping and the distribution only function in tandem – see e.g. Figure 4 in (Makhzani et al., 2015).
The primary architecture of the model is that of an autoencoder, which itself consists of two networks: an encoder Encr that maps an image x to a latent space representation z via a learned deterministic function q(zr|xr), and a decoder Decr to reconstruct samples from the latent space representation using a similarly learned function .
We use the following notation for these mappings: where an input image x is distinguished from a reconstructed image .
2.1.1. Encoder and Decoder
The autoencoder minimizes the pixel-wise binary crossentropy loss of the input and reconstructed input using binary cross entropy, where and the sum is over all the voxels p in all the channels in the images u. We use this function for all images regardless of content (i.e. we use it for xr and xr,s)
2.1.2. Encoding Discriminator
In addition to minimizing the above loss function, the autoencoder’s latent space – the output of Encr – is regularized by the use of an encoding discriminator EncDr. This discriminator EncDr attempts to distinguish between latent space embeddings that are mapped from the input data, and latent space embeddings that are generatively drawn from the desired prior latent space distribution (which is a 128 dimensional multivariate normal in this case). In attempting to “fool” the discriminator, the autoencoder is forced to learn a latent space distribution q(zr) that is similar in distribution to the prior distribution p(zr) (Makhzani et al., 2015).
The encoding discriminator EncDr is trained on samples from both the embedding space z ~ q(zr) and from the desired prior . We refer to z as observed samples, and as generated samples, and use the subscripts “obs” and “gen” to indicate these labels. Trained on these samples, EncDr outputs a continuous estimate of the source distribution, .
The objective function for the encoding discriminator is thus to minimize the binary-cross entropy between the true labels v and the estimated labels for generated and observed images:
2.1.3. Decoding Discriminator
The final component of the autoencoder for cell and nuclear shape is an additional adversarial network DecD r, the decoding discriminator, which operates on the output of the decoder to ensure that the decoded images are representative of the observed data distribution, similar to that of (Larsen et al., 2015). We train DecD r on images from the data distribution, , which we refer to as observed images, and on decoded draws from the latent space, , which we refer to as generated images. The loss function for the decoding discriminator is then:
2.2 Conditional model of structure localization
Given a trained model of cell and nuclear shape variation from the above network component, we then train a conditional model of structure localization upon the learned cell and nuclear shape model. This model (Figure 1, lower) consists of several parts, analogous to those outlined in the previous section: the core is a tandem encoder Encr,s and decoder Decr,s that encode and decode images to and from a low dimensional latent space; in addition, a discriminative decoder EncDs regularizes the latent space, and a discriminative decoder DecDr,s ensures that the decoded images are similar to the input distribution.
2.2.1. Conditional Encoder
The encoder Encr,s is given images containing both the reference structures and a single additional structure of tagged protein localization, xr,s, and produces three outputs:
Here is the reconstructed cell and nuclear shape latent-space representation learned in Section 2.1, ŷ is an estimate of which structure channel was learned, and zs is a latent variable that encodes all remaining variation in image content not due to cell/nuclear shape and structure channel. Therefore zs is learned dependent on the latent space embeddings of the reference structure, zr.
The loss function for the reconstruction of the latent space embedding of the cell and nuclear shape is the mean squared error between the embedding zr learned from the cell and nuclear shape autoencoder and the estimate of that embedding produced by the conditional portion of the model:
The output ŷ in equation7 is a probability distribution over structure channels, giving an estimate of the class label for the structure. In our notation, y is an integer value representing the true structure channel, and takes an integer value 1…K, while y is the one-hot encoding of that label, a vector of length K equal to 1 at the yth position and 0 otherwise. Similarly, ŷ is a vector of length K whose kth element represents the probability of assigning the label y = k.
We use the softmax function to assign these probabilities. In general, the softmax function is given by
The loss function for ŷ is then
The final output of the conditional encoder zs can be interpreted as a variable that encodes the residual variation in the localization of the labeled structure after conditioning on cell and nuclear shape.
2.2.2. Encoding Discriminator
As introduced above, the latent variable zs is regularized by an adversary EncDs that enforces the distribution of this latent variable to be similar to a chosen prior p(zs). The output of EncDs is a vector ŷEncDs that has |y| + 1 = K + 1 output labels, which take a value in [1,…, K, gen]. That is, ŷEncDs has one slot for observed embeddings of each particular labeled structure channel, and one additional slot for samples from our reference distribution. The loss function for the adversary is therefore:
The construction of the loss function is intended to remove structure-specific information from zs, forcing that information into ŷ.
2.2.3. Conditional Decoder
The conditional decoder Decr,s outputs the image reconstruction given the latent space embedding zr, the class estimator ŷ, and the structure channel variation zs:
The loss function for image reconstruction takes the same form as equation3 (the binary cross entropy between the input and reconstructed image):
2.2.4. Decoding Discriminator
As in the cell and nuclear shape model, attached to the decoder Decr,s is an adversary DecDr,s intended to enforce that the reconstructed images are similar in distribution to the input images. Similar to Section 2.2.2, the output of this discriminator is a vector ŷDecDr,s that has |y| + 1 = K + 1 output labels, which take a value in [1,…, K, gen]. As above, the loss function is:
2.3 Training procedure
The training procedure occurs in two phases. We first train the model of cell and nuclear shape variation, components Encr, Decr, EncDr, DecDr to convergence (Algorithm 1). We then train the conditional model, components Encr,s, Decr,s, EncDs, DecDr,s (Algorithm 2).
In training the model, we adopt three strategies from (Larsenet al., 2015): we limit error signals to relevant networks by propagating the gradient update from any DecD through only Dec, we train the image adversary, DecD, with both generated and reconstructed images, and we weight the gradient update from the discriminators with the scalars γEnc and γDec. The parameters of the model are therefore updated as follows:
2.4 Integrative Modelling
Beyond encoding and decoding images, we leverage the conditional model of structure localization given cell and nuclear shape (Section 2.1) as a tool to predict the localization of multiple unobserved structures, p(xs|xr, y), in a known cell and nuclear representation. In particular, we predict using the most probable structure localization given the cell and nuclear channels. The procedure for predicting this localization is outlined in Algorithm 3.
2.5. Estimate of structure variation
Given the differential relationships between these structures of interest and cell and nuclear shape, we are able to estimate the strength of the relationship between any given structure localization and the corresponding cell and nuclear shape. That is, if we fix the cell and nuclear shape embedding, zr, and structure type, y, we are able to measure how much variability we observe as we sample from the structure localization embedding distribution, zs, resulting an an estimate of structure variation.
3. Results
3.1. Data Set
We use a collection of 3D segmented cell images generated from static 3D spinning disc confocal microscopy images of human induced pluripotent stem cells. These cells are from clonal lines, each gene edited to express mEGFP on a protein that localizes to a specific structure of interest, e.g. α-actinin (actin bundles), α-tubulin (microtubules), β-actin (actin filaments), desmoplakin (desmosomes), fibrillarin (nucleolus), lamin B1 (nuclear membrane), myosin IIB (ac-tomyosin bundles), ST6GAL1 (golgi), Sec61β(endoplasmic reticulum), TOM20 (mitochondria), and ZO1 (tight junctions). Details on the cell lines, microscopy pipeline, and on the source image collection are available via the Allen Cell Explorer at http://www.allencell.org. Briefly, each image consists of channels corresponding to the nuclear signal, cell membrane signal, and a labeled sub-cellular structure of interest (Figure 2). Individual cells were segmented from a field, and each channel was processed by subtracting the most populous pixel intensity, zeroing-out negative-valued pixels, and rescaling image intensity to a value between 0 and 1. The cells were aligned by the major axis of the cell shape, centered according to the center of mass of the segmented nuclear region, and flipped according to image skew. Each of the 21,967 cell images were rescaled to 0.317 μm/px, and padded to 128 × 96 × 64 cubic voxels.
3.2. Model implementation
A summary of the model architectures is offered in Section C. The explicit model definitions and training procedures are available in the source code at https://github.com/AllenCellModeling/pytorch_integrated_cell.
We based the architectures and their implementations on a combination of resources, primarily (Larsen et al.,2015; Makhzani et al., 2015; Radford et al., 2015), and Kai Arulkumaran’s Autoencoders package (Arulkumaran, 2017).
We found that adding white noise to the first layer of decoder adversaries, DecDr and DecDr,s, stabilizes the relationship between the adversary and the autoencoder and improves convergence as in (Sønderby et al., 2016) and (Salimanset al., 2016).
We choose a 128-dimensional latent space for both Zr and Zs.
3.3. Training
To train the model, we used the Adam optimizer(Kingma & Ba, 2014) to perform gradient-descent, with a batch size of 30 and learning rate of 0.0002 for Encr, Decr, DecDr, Encr,s, Decr,s, DecDr,s, a learning rate of 0.01 for EncDr and EncDs, with γEnc and γDec values of 10−4 and 10−5 respectively. The dimensionality of the latent spaces Zr and Zs were set to 128, and the prior distribution for both is an isotropic gaussian.
We split the data set into 95% training and 5% test (for more details see Table S9), and trained the model of cell and nuclear shape for 150 epochs, and the conditional model for 150 epochs.
The training curves for the reference and conditional model are shown in Figure S3.
The model was implemented in PyTorch (http://pytorch.org) version 0.1.12., and run on 2 NVIDIA V100 graphics cards via NVIDIA-docker. The model took approximately two weeks to train. Further details of our implementation can be found in the software repository (Section 4).
3.4. Experiments
We performed a variety of experiments to explore the utility of the model. While quantitative assessment is paramount, the nature of the data makes qualitative assessment indispensable, so we include assessments of this type in addition to more traditional measures of performance.
3.4.1. Image Classification
While classification is not our primary use-case, it is a worthwhile benchmark of a well-functioning multi-class generative model. To evaluate the performance of the class-label identification of Encr,s, we compared the results of the predicted labels and true labels on our hold out set. A summary of the results of our multinomial classification task is shown in Table S11. Our model is able to accurately classify most structures, and has trouble only on the poorly sampled or underrepresented classes.
3.4.2. Image reconstruction
A necessary but not sufficient condition for the model to be useful is that cell images reconstructed from their latent space representations resemble the native images. Examples of image reconstruction from the training and test set are shown in Figure S1 for reference structures and Figure S2 for the structure localization model. The model recapitulates the essential structure localization patterns in the cells, and produce convincing reconstructions of cell and nuclear shape in both the training and test data.
3.4.3. Integrating Cell Images
Conditional upon the cell and nuclear shape, the model predicts the most probable position of any particular structure via Algorithm 3. Some examples of integrating structure localization given cell and nuclear shapes is shown in Figure 3.
3.4.4. Evaluation of Generated Image Distribution
To determine how well our model captures the variation in our data, we compare metrics computed on our input images and compare these to metrics computed on model-generated images.
We choose two methods for evaluating this variation: a pixel-wise measure of image similarity and the so-called inception score (Salimans et al., 2016).
As our measure of image similarity, we compute the pixel-wise cross correlation between the structure channels, xs, of every image in our training set. The distribution of this correlation is reported in Figure S6a. As expected, we can see that there is a large difference in this correlation across structures. Stereotypically localized structure labels, such as lamin B1 and fibrillarin, correlate highly; punctate cytoplasmic labels, such as desmoplakin, ST5GAL1 and ZO1 are highly variable in their localization in the current model.
We compare these distributions to the pair-wise cross correlation of encoded-decoded images from both the train and test set (Figures. S6c, S6d), as well as images generated by our model de novo (Figure S6b). Comparing the data distribution to the encoded-decoded images in the training set (Figure S6c), we see that the model does not capture the total amount of variation of the data. This is expected, as the autoencoder can be interpreted as a lossy compression method. While these distributions differ slightly, direct inspection of these images can be seen in structure channel of Figure S2, and these appear to be qualitatively similar.
Comparing the correlation between sampled images and the train and test images in (Figures S6c, S6d, S6b), we see that the distributions of the generated images are very similar to the encoded-decoded held-out test data, indicating that the model accurately captures high-level variation in the data in a generalizable manner.
An alternate evaluation metric is via the “inception score” introduced above (Salimans et al., 2016):
The inception score is the exponential of the average over the input data of the KL-divergence of the conditional probability of a label given an image, with respect to the marginal probabilities of each label. It a widely used measure of performance for generative models.
To compute the inception scores, we pass the generated images through the encoder Encr,s and record the predictions for the class of each image, p(y | x), from which the marginal p(y) is trivial. If the generator Decr,s creates representative images, the estimated class of each image image p(y | x) should have low entropy, while the distribution of class probability across all images, p(y), should have high entropy.
In Table S10, we report inception scores for our train and test data, and the generated images for each class sampled at the train and test set frequencies. On the test set, our model achieved an inception score of 7.561, compared to a score of 7.823 on the encoded-decoded test data, and a score of 7.876 for raw test data itself.
While the absolute magnitude of these scores is not directly evaluable at present, the similarity of these scores indicates that the variety and quality of the generated structures is on par with those in the data distribution.
3.4.5. Variation in Structure Localization
Given that the model reasonably approximates the data distribution, as described in the previous section, we are interested in how predictive the cell and nuclear shape is of structure localization. That is, how much intrinsic variation is there in the localization of a given structure, and how much of that structure’s localization is dictated by the shape of the cell and nucleus? This relationship is quantified to determine the model’s performance in predicting the location and variation of different subcellular structures in the following section.
We evaluated the variation the model captures by generating 1000 structure images per cell and nuclear shape, and assessing the variation in that structure localization (see4) by computing the negative log-determinant of the 1000x1000 matrix of pairwise correlations. By exploring this distribution of structure localization, we assessed the conditions in which structure organization is tightly coupled to that cell and nuclear shape. That is, for a given cell shape, the localization of one structure type may be highly variable as one moves through the structure latent space, while the location of another may not be as variable. Furthermore, it can be imagined that these relationships would vary from cell state to cell state.
We report the distribution of per-cell summary statistics for cell and nuclear shapes in our train and test set in Figure S7.
In addition to assessing structure-wise variability and accuracy, we can also investigate the relationships between these metrics and the cell cycle. Using expert annotation for a subset of 7592 of the cells, which annotate each cell as in one of eight phases of the cell cycle ({interphase, prophase, prophase 2, pro metaphase 1, pro metaphase 2, metaphase, anaphase, telophase-cytokinesis}), we subdivide our population in cells in stages, those that are labeled in as interphase, and those that are not. As seen in Figure S7, the distributions of variation per structure show interesting biological correlation: in mitosis, we see that there is more variation (shift toward smaller negative log-determinant) in the localization of lamin B1, fibrillarin, ST6GAL1, and Tom20, while the distribution of α-tubulin is less variable. From a biological point of view, this makes sense: the nuclear envelope breaks down in mitosis, entailing less stereotypy in the location of lamin B1 and fib-rillarin, with corresponding fragmentation of mitochondria (Tom20) and Golgi (ST6GAL1), while α-tubulin coalesces in to spindles that are highly stereotyped in location.
On a per-structure basis, we are also able to sort the cells in our input data according to which cell and nuclear shape corresponds to both the most and the least varied structure predictions, as shown in Figure S8. Structures such as lamin B1, fibrillarin, ST6GAL1, and Tom20 show a clear enrichment for mitotic cells in the most variable structure predictions, as compared to the least variable predictions, while the opposite is true of α-tubulin. This example for mitosis suggests that the present model is able to learn how the variability in the localization of different subcellular structures may be coupled to cell state.
3.4.6. Latent space traversal
We further explore the generative capacity of our model by mapping out the variation in cell morphology and structure localization via a traversal of the reference latent space with respect to cells in ordered stages of mitosis as introduced in the previous section.
Given the expert manual annotation of cell cycle for 7592 of our images and their corresponding cell and nuclear shape representations, zr, we select medoid landmark cells chosen from the test data as representative of each ordered stage in mitosis.
In our traversal of the 128 dimensional latent space of cell and nuclear shape, we interpolate linearly in radial coordinates between each medoid of our annotated stages, from interphase through metaphase. Samples along this path are shown in Figure S9.
As we move between stages of the cell cycle in the reference latent space, we also predict the localization of each of our structures of interest by generating the most probable structure channel localizations from the center of the structure latent space.
Although these cell cycle expert annotations were not used during model training, they offer an opportunity to investigate how well the model generalizes to unseen data along biologically important axes of variation. These predictions of localization demonstrate smooth transitions and plausible localization patterns as the cell shape progresses from interphase through metaphase on held-out data, demonstrating the ability of our model to capture and predict complex relationships in structure organization and potentially other key biological phenomena given new data.
4. Discussion
Building models that capture relationships between the morphology and organization of cell structures is difficult but important challenge. While previous research has focused on constructing subcellular structure-specific parametric approaches, due to the extreme variation in localization among different subcellular structures, these approaches may not be convenient to employ for all structures under all conditions. Here, we have presented a nonparametric conditional model of structure organization that generalizes well to a wide variety of localization patterns, encodes the variation in cell structure and organization, allows for a probabilistic interpretation of the image distribution, and generates high quality three-dimensional, integrated synthetic images.
Our model of cell and subcellular structure differs from previous generative models (Zhao & Murphy, 2007; Peng & Murphy, 2011; Johnson et al., 2015): we directly model fluorescent label localization, rather than the detected objects and their boundaries. While object segmentation can be essential in certain contexts, and helpful in others, when these approaches are not necessary, it can be advantageous to omit these non-trivial intermediate steps. Our model does not constitute a “cytometric” approach (i.e. counting objects), but since we are directly modeling the localization of signal, we drastically reduce the modeling time by minimizing the amount of segmentation and the task of evaluating this segmentation with respect to the “ground truth”.
Even considering these differences, the model is compatible with existing frameworks and allows for mixed parametric and non-parametric localization relationships, since the model can be used for predicting the localization of structures when an appropriate parametric representation may not exist.
Since the post of our initial preprint manuscript and software (Johnson et al., 2017), there has been increased interest in the application of GANs to modeling cell morphology and organization. (Osokin et al., 2017) constructed a network that can generate images of cell shape, and generates additional subcellular structures conditioned on those. (Goldsborough et al., 2017) uses a GAN to generate images of cells across different conditions and evaluate the performance of a discriminator in identifying the mechanism of action of these conditions. These other GAN approaches (Osokin et al., 2017; Goldsborough et al., 2017), allow neither a probabilistic interpretation nor prediction of structure localization, in real images. Furthermore, the methods presented above generate 2D images with one to two orders of magnitude more training data than the method presented here. Finally, in contrast to these models, our method allows for a reversible mapping of images to a low-dimensional latent space, which permits the prediction of novel subcellular localization patterns in extant data.
Our model permits several straightforward extensions, including extension to time series. Because of the flexibility of our latent-space representation, we can potentially explicitly encode information regarding known cell states and/or perturbations, such as position in cell cycle, time after drug treatment, or “distance” along a differentiation pathway. Given sufficient information, it would be possible to encode a representation of a “structure space” to predict the localization of unobserved structures, or “perturbation space”, as in (Paolini et al., 2006). Coupling this with active learning approaches (Naik et al., 2016) opens the way to potentially build models that learn and encode the localization of diverse subcellular structures under a variety different conditions.
Software and Data
The code for running the models used in this work is available at https://github.com/AllenCellModeling/pytorch_integrated_cell
The data used to train the model is available at s3://aics.integrated.cell.arxiv.paper.data.
Author Contributions
GRJ and RMD conceived, designed and implemented experiments. GRJ, RMD, and MMM wrote the paper.
Acknowledgements
We would like to thank Robert F. Murphy, Julie Theriot, Rick Horwitz, Graham Johnson, Forrest Collman, Sharmishtaa Seshamani and Fuhui Long for their helpful comments, suggestions, and support in the preparation of the manuscript.
Furthermore, we would like to thank all members of the Allen Institute for Cell Science team, who generated and characterized the gene-edited cell lines, developed image-based assays, and recorded the high replicate data sets suitable for modeling. We particularly thank Liya Ding for segmentation data and Irina Mueller for mitosis annotations. These contributions were absolutely critical for model development.
We would like to thank Paul G. Allen, founder of the Allen Institute for Cell Science, for his vision, encouragement and support.
Footnotes
gregj{at}alleninstitute.org
rorydm{at}alleninstitute.org
mollym{at}alleninstitute.org