Automatic cervical cell segmentation and classification in Pap smears

https://doi.org/10.1016/j.cmpb.2013.12.012Get rights and content

Abstract

Cervical cancer is one of the leading causes of cancer death in females worldwide. The disease can be cured if the patient is diagnosed in the pre-cancerous lesion stage or earlier. A common physical examination technique widely used in the screening is Papanicolaou test or Pap test. In this research, a method for automatic cervical cancer cell segmentation and classification is proposed. A single-cell image is segmented into nucleus, cytoplasm, and background, using the fuzzy C-means (FCM) clustering technique. Four cell classes in the ERUDIT and LCH datasets, i.e., normal, low grade squamous intraepithelial lesion (LSIL), high grade squamous intraepithelial lesion (HSIL), and squamous cell carcinoma (SCC), are considered. The 2-class problem can be achieved by grouping the last 3 classes as one abnormal class. Whereas, the Herlev dataset consists of 7 cell classes, i.e., superficial squamous, intermediate squamous, columnar, mild dysplasia, moderate dysplasia, severe dysplasia, and carcinoma in situ. These 7 classes can also be grouped to form a 2-class problem. These 3 datasets were tested on 5 classifiers including Bayesian classifier, linear discriminant analysis (LDA), K-nearest neighbor (KNN), artificial neural networks (ANN), and support vector machine (SVM). For the ERUDIT dataset, ANN with 5 nucleus-based features yielded the accuracies of 96.20% and 97.83% on the 4-class and 2-class problems, respectively. For the Herlev dataset, ANN with 9 cell-based features yielded the accuracies of 93.78% and 99.27% for the 7-class and 2-class problems, respectively. For the LCH dataset, ANN with 9 cell-based features yielded the accuracies of 95.00% and 97.00% for the 4-class and 2-class problems, respectively. The segmentation and classification performances of the proposed method were compared with that of the hard C-means clustering and watershed technique. The results show that the proposed automatic approach yields very good performance and is better than its counterparts.

Introduction

Cervical cancer is the fourth leading cause of cancer death in females worldwide [1]. The prognosis for cervical cancer depends on the stage of the cancer at the time of detection. The disease can be cured if diagnosed in the pre-cancerous lesion stage or earlier. Papanicolaou test or Pap test is a physical examination technique widely used to prevent cervical cancer by finding cells that have the potential to turn cancerous. It was estimated that, in the year 2006, systematic screening can reduce mortality rates from cervical cancer by 70% or more [2].

In Thailand, cervical cancer is the second most common cancer among women [3], with a high mortality rate of nearly 300,000 per year. The screening program administered by the Ministry of Public Health and the National Health Security Office suggested that women aged 35–60 years should undergo Pap smear examination every five years [4]. With an overall population number of 67 million in Thailand [5], the number of samples to be examined per year is large comparing to the number of cytologists who process the screening. The study of automatic cervical cell classification has been done for over 40 years to cope with this labor-intensive task of manual screening and also to reduce the error of screening result. A number of commercial automated screening systems have been approved by the FDA for quality control with examples including PAPNET (Neuromedical Systems Inc.), FocalPoint Slide Profiler™ (formerly AUTOPAP; BD TriPath), ThinPrep Pap Test, ThinPrep Imaging System (Hologic Inc.) and Imager™ (Cytyc). Several research works have shown that these automated systems indeed improve the accuracy of the screening result and reduce the false-negative rate [6], [7], [8]. However, cost effectiveness is a major drawback of these systems with the cost of PAPNET test far exceeds that of manual screening.

Uncertainty in diagnostic capability was also reported [9]. It is therefore suggested that the automated system should be used as an aiding tool in conjunction with the expert's opinion rather than relying on the system as a primary screening and diagnosing tool [10], [11]. Over the past few years, trend of research in automated cervical cancer screening has shifted from cytology screening to histology image [12], [13] and colposcopic image [14], [15]. Histology image is not only used in cervical cancer screening, but it is used in the other kinds of cancer screening also [16], [17], [18]. However, cytology screening is still a default screening method in most countries due to its relatively low cost and its effectiveness in cervical cancer prevention if the screening is routinely performed.

The screening process normally starts with gathering cervical cell samples from the uterine cervix and mounting it on a glass slide. The collected sample is visually inspected under a microscope to identify the target cell or classify each cell into categories. The basic characteristics used to classify the stage of cells are mainly the characteristics of cell nuclei and cytoplasm such as shape, size, texture, ratio of nucleus and cytoplasm. From image processing point of view, the first step in extracting information from cell components is to correctly identify a region of each component (nucleus, cytoplasm, and non-cell components) by segmentation process. There are several research works on nucleus segmentation [19], [20], [21], [22]. However, when one would like to classify each cervical cell into categories with only nucleus information, it might not yield a good performance. Hence, segmenting whole cell is more desirable [23], [24], [25], [26]. However, there is no classification result reported in these works. After the segmentation step, each cell is then classified using specific classifiers based on the extracted features from cell components as mentioned earlier or by using filters to discriminate classes without feature extraction process [27]. However, the classification performance in Ref. [27] is not quite high.

In this research, we propose a method for automatic segmentation and classification of cervical cell images. In the segmentation process, we use a patch-based fuzzy C-means (FCM) clustering technique. A cell image is segmented by using the over-segment FCM technique into nucleus, cytoplasm, and background. For comparison, we use the hard C-means clustering technique and the watershed segmentation technique in the segmentation step as well. The segmented image is then used to extract related features to be the input to classifiers. Five classifiers including Bayesian classifier, linear discriminant analysis, K-nearest neighbor, artificial neural networks, and support vector machine, are considered. The usefulness of features based on nucleus and entire cell is also investigated.

This paper is organized as follows: the following section describes basic knowledge of the segmentation technique, mathematical morphologies, feature extraction, classifiers, followed by the details of cell datasets used in this research. Section 3 provides the experimental results and discussion. The conclusion is drawn in Section 4.

Section snippets

Cervical cell segmentation using patch-based fuzzy C-means clustering

We invented the patch-based fuzzy C-means (FCM) clustering method to segment nuclei and cytoplasm of white blood cells [28]. It was later applied to segment nuclei of cervical cells from the conventional Pap test [29]. The FCM is good for clustering data with uncertainty. We, therefore, chose the FCM to cluster uncertain cell image intensity data. In this research, the segmentation method is tested with cervical cells from, not only the conventional Pap test, but also the ThinPrep® Pap test.

Results and discussion

For the sake of generalization of the classification results, the leave-one-out cross validation (LOOCV) was applied throughout the experiments. All results shown in this section are for the validation data in the LOOCV.

Conclusions

This research proposes a method of automatic cervical cell image segmentation and classification. We used the over-segment fuzzy C-means clustering technique to segment each cell into 2 or 3 regions. Three Pap smear datasets, i.e., ERUDIT, LCH, and Herlev, were tested. The 4-class problem consists of 4 cell classes, i.e., normal, low grade squamous intraepithelial lesion (LSIL), high grade squamous intraepithelial lesion (HSIL), and squamous cell carcinoma (SCC). When the last 3 classes are

Conflict of interest statement

The authors declare no conflict of interest.

Acknowledgements

Financial support from the Thailand Research Fund through the Royal Golden Jubilee Ph.D. Program (Grant No. PHD/0238/2550) to Thanatip Chankong and Nipon Theera-Umpon is acknowledged. We thank Dr. Taweethong Koanantakool, Department of Medical Services, Ministry of Public Health, Thailand, for introducing us the cervical cancer classification problem. We would like to thank Dr. Jan Jantzen for supporting the ERUDIT and Herlev Pap smear datasets. We are thankful to Lampang Cancer Hospital,

References (47)

  • P.-Y Pai et al.

    Nucleus and cytoplast contour detector from a cervical smear image

    Expert Systems with Applications

    (2012)
  • A. Genctav et al.

    Unsupervised segmentation and classification of cervical cell images

    Pattern Recognition

    (2012)
  • G.K. Matsopoulos et al.

    MITIS: a WWW-based medical system for managing and processing gynaecological–obstetrical–radiological data

    Computer Methods and Programs in Biomedicine

    (2004)
  • A. Jemal et al.

    Global cancer statistics

    CA: A Cancer Journal for Clinicians

    (2011)
  • H.C. Kitchenera et al.

    Achievements and limitations of cervical cytology screening

    Vaccine

    (2006)
  • (2011)
  • International Health Policy Program Thailand (IHPP) and Health Intervention and Technology Assessment Program (HITAP)

    Research for Development of an Optimal Policy Strategy for Prevention and Control of Cervical Cancer in Thailand

    (2008)
  • Central Intelligence Agency

    Thailand Demographics Profile 2012

    (2012)
  • R. Ashfaq et al.

    Detection of endocervical component by PAPNET™ system on negative cervical smears

    Diagnostic Cytopathology

    (1996)
  • J.S. Lee et al.

    A feasibility study of the AutoPap system location-guided screening

    Acta Cytologica

    (1998)
  • H. Doornewaard et al.

    Reproducibility in double scanning of cervical smears with the PAPNET system

    Acta Cytologica

    (2000)
  • C.V. Biscotti et al.

    Assisted primary screening using the automated ThinPrep imaging system

    American Journal of Clinical Pathology

    (2005)
  • S.M. Freire et al.

    A record linkage process of a cervical cancer screening database

    Computer Methods and Programs in Biomedicine

    (2013)
  • Cited by (0)

    View full text