Deep learning-based segmentation of high-resolution computed tomography image data outperforms commonly used automatic bone segmentation methods

Daniella M. Patton; Emilie N. Henning; Rob W. Goulet; Sean K. Carroll; Erin M.R. Bigelow; Benjamin Provencher; Nicolas Piché; Mike Marsh; Karl J. Jepsen; Todd L. Bredbenner

doi:10.1101/2021.07.27.453890

Abstract

Segmenting bone from background is required to quantify bone architecture in computed tomography (CT) image data. A deep learning approach using convolutional neural networks (CNN) is a promising alternative method for automatic segmentation. The study objectives were to evaluate the performance of CNNs in automatic segmentation of human vertebral body (micro-CT) and femoral neck (nano-CT) data and to investigate the performance of CNNs to segment data across scanners.

Scans of human L1 vertebral bodies (microCT [North Star Imaging], n=28, 53μm³) and femoral necks (nano-CT [GE], n=28, 27μm³) were used for evaluation. Six slices were selected for each scan and then manually segmented to create ground truth masks (Dragonfly 4.0, ORS). Two-dimensional U-Net CNNs were trained in Dragonfly 4.0 with images of the [FN] femoral necks only, [VB] vertebral bodies only, and [F+V] combined CT data. Global (i.e., Otsu and Yen) and local (i.e., Otsu r = 100) thresholding methods were applied to each dataset. Segmentation performance was evaluated using the Dice coefficient, a similarity metric of overlap. Kruskal-Wallis and Tukey-Kramer post-hoc tests were used to test for significant differences in the accuracy of segmentation methods.

The FN U-Net had significantly higher Dice coefficients (i.e., better performance) than the global (Otsu: p=0.001; Yen: p=0.001) and local (Otsu [r=100]: p=0.001) thresholding methods and the VB U-Net (p=0.001) but there was no significant difference in model performance compared to the FN + VB U-net (p=0.783) on femoral neck image data. The VB U-net had significantly higher Dice coefficients than the global and local Otsu (p=0.001 for both) and FN U-Net (p=0.001) but not compared to the Yen (p=0.462) threshold or FN + VB U-net (p=0.783) on vertebral body image data.

The results demonstrate that the U-net architecture outperforms common thresholding methods. Further, a network trained with bone data from a different system (i.e., different image acquisition parameters and voxel size) and a different anatomical site can perform well on unseen data. Finally, a network trained with combined datasets performed well on both datasets, indicating that a network can feasibly be trained with multiple datasets and perform well on varied image data.

Competing Interest Statement

Benjamin Provencher, Nicolas Pichè, and Mike Marsh are employees of Object Research Systems, Inc. The remaining authors have no conflicts of interest to declare.

The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.