3D Cell Nuclear Morphology: Microscopy Imaging Dataset and Voxel-Based Morphometry Classification Results

Cell deformation is regulated by complex underlying biological mechanisms associated with spatial and temporal morphological changes in the nucleus that are related to cell differentiation, development, proliferation, and disease. Thus, quantitative analysis of changes in size and shape of nuclear structures in 3D microscopic images is important not only for investigating nuclear organization, but also for detecting and treating pathological conditions such as cancer. While many efforts have been made to develop cell and nuclear shape characteristics in 2D or pseudo-3D, several studies have suggested that 3D morphometric measures provide better results for nuclear shape description and discrimination. A few methods have been proposed to classify cell and nuclear morphological phenotypes in 3D, however, there is a lack of publicly available 3D data for the evaluation and comparison of such algorithms. This limitation becomes of great importance when the ability to evaluate different approaches on benchmark data is needed for better dissemination of the current state of the art methods for bioimage analysis. To address this problem, we present a dataset containing two different cell collections, including original 3D microscopic images of cell nuclei and nucleoli. In addition, we perform a baseline evaluation of a number of popular classification algorithms using 2D and 3D voxel-based morphometric measures. To account for batch effects, while enabling calculations of AU-ROC and AUPR performance metrics, we propose a specific cross-validation scheme that we compare with commonly used k-fold cross-validation. Original and derived imaging data are made publicly available on the project webpage: http://www.socr.umich.edu/projects/3d-cell-morphometry/data.html.


INTRODUCTION
Morphology of a cell nucleus and its compartments is regulated by complex biological mechanisms related to cell differentiation, development, proliferation, and disease [1]. Changes in the nuclear form are associated with reorganization of chromatin architecture and related to altered functional properties such as gene regulation. Conversely, external geometric constraints and mechanical forces that deform the cell nucleus affect chromatin dynamics and gene and pathway activation [2]. Thus, nuclear morphological quantification becomes of major relevance as the studies of the reorgani-zation of the chromatin and DNA architecture in the spatial and temporal framework, known as the 4D nucleome, emerge [3]. Quantitative analyses of nuclear and nucleolar morphological changes also have medical implications, for example, in detection and treatment of pathological conditions such as cancer [4,5].
Many algorithms have been proposed to classify cell and nuclear morphological phenotypes using 3D representations [6] that are often more informative as nuclear shape descriptors than 2D features [7,8]. However, there is a lack of publicly available 3D cell image datasets that could serve for the comparison of various tools for the 3D cell nuclear morphometry. This limitation becomes of great importance in the modern reality of big data microscopy, when the ability to evaluate different approaches on benchmark data is needed for better dissemination of the current state of the art methods for bioimage analysis [9].
In order to objectively evaluate methods for 3D cell nuclear morphometry, we built and made publicly available a dataset of segmented and curated nuclear and nucleolar masks from fluorescence microscopy images. We then extracted common voxel-based measures of mask morphology in 2D and 3D, and evaluated a number of classification algorithms to provide performance baselines.

DATASET PREPARATION
The dataset is composed of two different cell collections.
The first collection is of Fibroblast cells (newborn male) that were purchased from ATCC (BJ Fibroblasts CRL-2522 normal) and subjected to a G0/G1 Serum Starvation Protocol [10], which has been shown to affect nuclear morphology [11]. This protocol provided us with images of the following conditions or phenotypes: cell cycle synchronized by serumstarvation (SS) and proliferating (PROLIF).
The second collection contains human prostate cancer cells (PC3). Through the course of progression to metastasis, malignant cancer cells undergo a series of reversible transitions between intermediate phenotypic states bounded by pure epithelium and pure mesenchyme [5]. These transitions in prostate cancer are associated with quantifiable changes in nuclear and nucleolar structure [12,13]. Microscope slides of prostate cancer cell line PC3 were cultured in epithelial (EPI) and mesenchymal transition (EMT) phenotypic states.

Image acquisition
Cells in both collections were labeled with 3 different fluorophores: DAPI (4',6-diamidino-2-phenylindole), a common stain for the nuclei, fibrillarin antibody (anti-fibrillarin) and ethidium bromide (EtBr), both used for nucleoli staining. Although anti-fibrillarin is a commonly used nucleolar label, we found it to be too specific, which made extraction of a shape mask problematic. It has been shown that EtBr can be used for staining dense chromatin, nucleoli, and ribosomes [14]. We found that it provides better overall representation of nucleolar shape. Anti-fibrillarin was combined with EtBr by colocalization to confirm correct detection of nucleoli locations as described below. 3D imaging used a Zeiss LSM 710 laser scanning confocal microscope with a 63x PLAN/Apochromat 1.4NA DIC objective.
For multichannel vendor data, the channels were separated and saved as individual volumes labeled as c0, c1, c2, representing the DAPI, anti-fibrillarin, and EtBr channels, respectively, Fig. 1A. Each channel-specific volume was then re-sliced into a 1, 024 × 1, 024 × Z lattice (Z = {35, 50}), where regional sub-volumes facilitated the alignment with the native tile size of the microscope. All sub-volumes were saved as multi-image 3D TIFF volumes. For every subvolume, accompanying vendor meta-data was extracted from the original data.

Segmentation
We performed the automatic 3D segmentation of nuclei using the Farsight toolkits Nuclear Segmentation [15], Fig. 1B. This tool was created specifically to segment DAPI-stained nuclei in 2D/3D and it does not require a labeled training set, and demonstrated stable results on these data. After segmentation, each segmented nucleus was represented as a mask, Fig. 1D. Post-segmentation processing of nuclear masks included 3D hole filling and a filtering step that removed the objects if they span the edge of a tile, are connected to other objects, or their voxel count was lower or higher than the empirically estimated threshold value.
Since the Farsight toolkits Nuclear Segmentation is not well suited for segmentation of cellular components other than nuclei, the classification of objects within the nucleus was performed using the Trainable Weka Segmentation [16]. Nuclear masks were used to isolate sub-nuclear segmentations in the EtBr and anti-fibrillarin channels to objects within a nucleus. An individual classifier model was created for each channel by using a random selection of 10% of the sub-volumes within that channel for training Fig. 1C. Trained models were then applied to all sub-volumes and nucleolar masks were created from the resulting probability maps and labeled as connected components. The nucleolar mask quality control protocol was similar to that of the nuclear masks with the additional filter for the spherical compactness of identified objects [17]. Finally, both EtBr and anti-fibrillarin segmented volumes were used as input to a co-localization algorithm to validate the segmented EtBr-stained nucleoli based on the presence of anti-fibrillarin, Fig. 1D.

Dataset structure and archiving
Each collection in the final dataset consisted of sets of 3D segmented sub-volumes in the TIFF format from channel c0 (DAPI), representing binary nuclear masks, accompanied by a set of 3D binary masks of nucleoli from channel c2 (EthBr) per nucleus. To expand the potential use of the dataset, we also included original unsegmented 1024 × 1024 × Z TIFF sub-volumes in all 3 channels with corresponding meta-data, Fig. 1E. Both collections were grouped and archived per original volume for easier downloading, and made publicly available on the project web-page: http://www.socr. umich.edu/projects/3d-cell-morphometry/ data.html. Each archive also contains a README file with the text description of the file structure.

MORPHOMETRIC CLASSIFICATION
To establish baseline morphometry classification results, we extracted multiple voxel-based morphometric characteristics from 3D binary masks and their 2D maximum intensity projection (2D masks). We used features to evaluate the performance of a number of widely used classification algorithms. We also assessed possible batch effects in data by comparing two different cross-validation techniques.

Voxel-based morphometry
We used image processing library, scikit-image [18], to extract sets of 2D and 3D voxel-based morphometric features from both nuclear and nucleolar binary masks.
The set of 3D morphometry features included: object volume, volume of the 3D bounding box, volume of the filled region, diameter of a sphere with the same volume as the object, and ratio of the object volume to the bounding box volume.
The 2D feature set included 2D analogs of all 3D measures and was supplemented by: convex hull area, eccentricity, Euler number, filled region area, eigenvalues of the inertia tensor of the region, major and minor axis of an ellipse fitted to the region, the angle between the X-axis and the major axis of the fitted ellipse, perimeter of an object which approximates the contour of the region, the ratio of the region area to the convex hull area.
In oder to aggregate the nucleolar features per nucleus we computed median, minimum, maximum, and standard deviation for each morphometry measure across the nucleoli within the nucleus. Correspondingly, nuclei that did not have any internally positioned nucleoli were excluded from further analysis. The number of detected nucleoli per nucleus was included as an individual feature.

Classification
We compared various supervised classification algorithms from scikit-learn, a popular Python machine learning toolkit [19], including Gaussian Naive Bayes (NB), Linear Discriminant Analysis (LDA), k nearest neighbors classifier (kNN), support vector machines with linear (SVM) and Gaussian kernels (RBF), Random Forest (RF), Extremely Randomized Trees (ET), and Gradient Boosting (GBM). All classifiers used default hyper-parameters. Feature preprocessing included feature standardization by removing the mean and scaling to unit variance of the training set.
To evaluate the possible batch effect that could occur during the image acquisition, we compared traditional k-fold cross-validation (CV) scheme with the suggested Leave-2-Opposite-Groups-Out (L2OGO) scheme. L2OGO ensures that: (1) all masks derived from one image fall either in the training or testing set, and (2) testing set always contains masks from 2 images of different classes. Unlike Leave-One-Group-Out CV, L2OGO enables per-split evaluation of performance metrics such as the Area under the Precision-Recall curve (AUPR) and the Area Under the Receiver Operating Characteristic curve (AUROC). Since original volumes were of different size and contained different number of nuclei, we joined smaller volumes into common groups to address class imbalance in testing sets and reduce the variance of the performance metric estimates. Given the class imbalance in L2OGO, we used AUC, AUPR, and F1 score to compare algorithms [20].

Fibroblast cells classification
After segmentation and exclusion of nuclei without detected nucleoli, the full collection of fibroblasts consisted of total 965 nuclei (498 SS and 470 PROLIF) and 2,181 nucleoli (1,151 SS and 1,030 PROLIF) extracted from the total of 11 volumetric images (7 SS and 4 PROLIF). 2D and 3D morphometric measures of nuclear and nucleolar masks were merged into per-nucleus feature vectors as described above.We evaluated the performance of algorithms for Fibroblast morphometric classification on 2 different CV schemes: 20 splits in L2OGO and a 7 times repeated 4-fold CV.
Results shown in Fig. 2 do not demonstrate any batch effects in 2D classification. However, 3D performance of all classifiers using L2OGO was lower compared to 4-fold CV, which indicates the possibility of batch effects and overoptimistic classification results in 4-fold CV. As expected, L2OGO led to a large variance of metrics, especially in the F1 score, which can be a result of classifiers' sensitivity to the class imbalance. Using L2OGO, a number of algorithms showed higher performance on 3D morphometry with the best result by a Gaussian SVM (RBF) classifier with

PC3 cells classification
After exclusion of nuclei without detected nucleoli, the segmented PC3 collection consisted of 458 nuclear (310 EPI and 148 EMT) and 1,101 nucleolar (649 EPI and 452 EMT) masks extracted from the total of 6 volumetric images (2 EPI and 4 EMT). After merging smaller EMT groups, L2OGO scheme produced 4 pairs of groups as training and testing sets. We compared L2OGO to 4-fold CV repeated 2 times.
Similar to the previous experiment, 2D morphometry classification performance was quite similar for both CV schemes, see Fig 3. However, in 3D, the performance of algorithms degraded remarkably when using L2OGO CV such that no methods performed better than in 2D. In this case, the best classification by single classifier was the result of applying a Gradient Boosting classifier (GB) with median AU C = 0.774 ± 0.017, AU P R = 0.875 ± 0.019, F 1 = 0.818 ± 0.018.
Results of classification on both collections suggest that the combination of the voxel-based morphometry and common algorithms with default parameters can provide a good baseline performance, especially in 2D. Using 3D masks can improve the performance as it did in Fibroblast classification. However, it suggests that having the third dimension sometimes can intensify batch effects and, thus, require more complex validation schemes.

DISCUSSION
3D cell microscopy is a powerful technique that enables investigation of biological mechanisms related to morphological changes in cell nucleus through quantitative analysis of changes in its size and shape. However, a lack of publicly available 3D cell image datasets limits the comparison of various 3D cell nuclear morphology analysis solutions. To address this limitation, we presented a new dataset that consists of two collections of 3D segmented binary masks from 2 different cell types and contains a of total of 1,433 segmented nuclear and 3,282 nucleolar 3D binary masks that can be used for evaluation of cell nuclear and nucleolar morphological classification methods. To account for batch effects, while enabling calculations of AUROC and AUPR performance metrics, we proposed a specific cross-validation scheme (L2OGO). We compared a number of commonly used machine learning classification algorithms on both collections of data using voxel-based morphometric measures extracted from original 3D binary masks as well as their 2D maximum intensity projections. Classification results provide a baseline that can be used for future comparison of morphological classifiers. As a limitation of this work, the microscope settings did not meet the Nyquist sample rates and may have created distortions in the digitized images [21].
Imaging protocols, original and segmented data, and the source code are made publicly available on the project webpage: http://www.socr.umich.edu/projects/ 3d-cell-morphometry/data.html