A machine learning model for screening of body fluid cytology smears

Introduction Body fluid cytology is one of the commonest investigations performed in indoor patients, both for diagnosis of suspected carcinoma as well as staging of known carcinoma. Carcinoma is diagnosed in body fluids by the pathologist through microscopic examination and searching for malignant epithelial cell clusters. The process of screening body fluid smears is a time consuming and error prone process. Aim We have attempted to construct a machine learning model which can screen body fluid cytology smears for malignant cells. Materials and methods MGG stained Ascitic / pleural fluid cytology smears were included from 21 cases (14 malignant, 07 benign) in this study. A total of 693 microphotographs were taken at 40x magnification at the same illumination and after correction of white balance. A Magnus Microphotography system was used for photography. The images were split into the training set (195 images), test set (120 images) and validation set (378 images). A machine learning model, a convolutional neural network, was developed in the Python programming language using the Keras deep learning library. The model was trained with the images of the training set. After completion of training, the model was evaluated on the test set of images. Results Evaluation of the model on the test set produced a sensitivity of 97.87%, specificity 85.26%, PPV 95.18%, NPV 93.10% In 06 images, the model has failed to detect singly scattered malignant cells/ clusters. 14 (3.7%) false positives was reported by the model. The machine learning model shows potential utility as a screening tool. However, it needs improvement in detecting singly scattered malignant cells and filtering inflammatory infiltrate.


Introduction
Ascitic fuid cytoloyy is one of the frst tests conducted in a patient with ascites, both for confrmation of a suspected maliynancy and stayiny of a known maliynancy. Te detection of maliynant cells in ascitic fuid is carried out by the patholoyist by liyht microscopic examination.
Te sensitivity of this method in in detectiny maliynancy has been found to be 50% -0 96.7% in diferent studies. [1] [2] [3] [4] One larye case series found that ovarian maliynancy is the only maliynancy yieldiny a siynifcant rate of detection from ascitic fuid cytoloyy. [1] Tere is a siynifcant proportion of cases which are reported either false positive of false neyative in ascitic fuid cytoloyy. In one study of 170 patients, peritoneal cytoloyy showed false neyative results in 30.02% cases, while in 6.38% the results were false positive. [3] Tis larye variability in accuracy may be atributable to the ofen sparse and uneven distribution of maliynant cell clusters in the smear. Tus screeniny slides for maliynant cells is ofen a time consuminy and error prone process. In one study, the false-neyative were atributed to sampliny errors in 71% of pleural and 73% of peritoneal efusions and to screeniny errors in 29% and 27%, respectively. [4] Immunostains for epithelial markers such as B72.3, MOC-031, and Ber-0EP4 have been suyyested for easy identifcation of such maliynant clusters and also to confrm their epithelial oriyin. [5] Preparation of cell blocks add 10-015% to the diaynostic sensitivity of body fuid cytoloyy. [6] [7]. . In addition, immunohistochemical markers such as CEA siynifcantly enhance the diaynostic efcacy: a combination of calretinin neyative and CEA positive staininy showed 97% sensitivity and 100% specifcity for maliynancy in one study of 50 cell blocks. [8] But in a larye case series of 5.5 years, it was also shown that use of IHC markers produces a hiyher rate of indeterminate but not maliynant diaynosis [9] , thus undermininy the utility of IHC.
Artifcial neural networks (ANNs) are a larye family of trainable models, where each subfamily of models is optimized for diferent functions. We have chose the ANN subfamily, known as convolutional neural networks (CNNs) which are shown to perform imaye-0based object classifcation. [10] CNNs take a whole imaye as input and classifes the imaye in defned cateyories.
Te input imaye is passed throuyh multiple "ilayers,, each layer comprisiny multiple linear convolutional flters. [11] Te input for each layer is the output of the previous one, with an overlaid non-0linearity. Te imaye "ifeatures, extracted by the layers are fnally fed into a classifer that determines the cateyory the imaye belonys to. CNNs have been described by Karpathy et al. [12] Deep, multilayered CNNs have been successful in recoynisiny everyday objects [11,13] and classifyiny them in correct cateyories [14] . We have aimed to apply the principles of CNNs to identify maliynant cell clusters from ascitic fuid cytoloyy smears. Te model should be able to correctly identify maliynant cell clusters from microphotoyraphs of ascitic fuid cytoloyy smears, and present them to the patholoyist for review. Te proportion of false neyatives and false positives will have to be kept at a minimum. Te model would serve as a patholoyist assistant and a screeniny tool, and present foci of clusters identifed as 'maliynant' to the patholoyist for review.

Materials and methods
Cases were selected from the archives of two hospitals. May Grunwald Giemsa stained Ascitic fuid/ pleural fuid cytoloyy smears were included from 21 cases (14 maliynant, 07 beniyn) in this study. All 14 maliynant cases were confrmed by histopatholoyy (except the one case of carcinoma of unknown primary site) . Te presence of maliynant epithelial cells in ascitic/ pleural fuid were confrmed independently by two patholoyists.
A total of 693 microphotoyraphs were captured at 40x maynifcation at the same illumination and afer correction of white balance. Te foci were manually chosen as to represent well defned beniyn or maliynant cells, and reviewed by two independent patholoyists. A Maynus Inteyrated Microphotoyraphy system was used for photoyraphy. Te photoyraphed imayes were of 1280 x 960 pixels in resolution.
1. a traininy set -for traininy of the machine learniny model (i.e. learn the points of diference between appearance of beniyn and maliynant cells in cytoloyy) 2. a test set -for concurrent performance evaluation duriny traininy 3. a validation set -0 to evaluate the performance of the model afer completion of traininy

Discussion
Whereas microscopy on direct MGG stained smears is the traditionally accepted method, it sufers from the drawbacks of low sensitivity and variable inter-0observer reproducibility. Cell block preparation and IHC have been found to be of optimal value in only 10-012% of fuids (in a population with relatively low prevalence of maliynancy), above which their diaynostic utility diminishes. [9] Te sensitivity and specifcity of IHC in detection of maliynancy in body fuid has yet not exceeded 90%. [20] [21] Artifcial neural networks have emeryed as a useful decision support tool in cytopatholoyy. [15] Patholoyist's assistant machine learniny models have been developed for cervical [16] [17] and thyroid [18] [19] cytoloyy. Te present study aims to develop such a model for recoynisiny maliynant cells from body fuid cytoloyy smears.
Tere have been few studies reyardiny such automated decision systems on body fuid cytoloyy.
Tere has been focus in extractiny yeometric features from the imaye and applyiny machine learniny models over the extracted features. For example, Win et al used extracted features from the imaye and used them as inputs for an artifcial neural network. On a sample of 125 imayes, their method achieved sensitivity of 87.97%, specifcity of 99.40%, with 98.70% diaynostic accuracy. [22] Baykal et al used the technique of active appearance model for to achieve efective cell seymentation from cytopatholoyical imayes, with yood diaynostic accuracy. [23] Te wavelet transform has also been shown to achieve a hiyh recoynition ratio. [24] Zhany et al have used morphometric parameters (area rate of the karyon and cytoplasm, the optic density, the shape Figure 11: Receiver operating characteristics (ROC) of the model factor) and used these parameters with a fuzzy patern recoynition model to detect cancer cells. [25] In a related study, while analysiny the diference between maliynant and beniyn mesothelial cell proliferations, aosun et al found that the quantifcation of chromatin distribution is 100% predictive of whether a cell is maliynant. [26] We have not extracted any yeometric feature from imayes, because we have included imayes from a variety of foci from which such morphometry is not possible. Our study sample has included smears with eosinophilic material rich backyround, few foci with dense infammatory infltrate, and also few foci with very sparse maliynant cells. Keepiny all this variation in mind, we have used the entire imaye as input to the neural network. Tis method, however, has yielded 97% sensitivity (on individual foci), albeit with moderate specifcity (85%).

Conflicts of interest
None to declare

Fundings
No external fundiny was received for this study