A machine learning based approach to the segmentation of micro CT data in archaeological and evolutionary sciences

Segmentation of high-resolution tomographic data is often an extremely time-consuming task and until recently, has usually relied upon researchers manually selecting materials of interest slice by slice. With the exponential rise in datasets being acquired, this is clearly not a sustainable workflow. In this paper, we apply the Trainable Weka Segmentation (a freely available plugin for the multiplatform program ImageJ) to typical datasets found in archaeological and evolutionary sciences. We demonstrate that Trainable Weka Segmentation can provide a fast and robust method for segmentation and is as effective as other leading-edge machine learning segmentation techniques.


1.
School of Life Sciences, Faculty of Science and Engineering, Anglia Ruskin Locally adaptive segmentation 94 Locally adaptive segmentation is increasingly carried out using deep learning in an automated  (Forgy, 1965;MacQueen, 1967). This method, and extensions of it, have been used widely in 106 MRI processing (e.g. (Dimitriadou et al., 2004;Juang and Wu, 2010;Singh et al., 1996). An 107 interesting note is that both Forgy and MacQueen (Forgy, 1965;MacQueen, 1967) cautioned 108 against using k-means clustering as a definitive algorithm, but as an aid to the user in interpreting 109 clusters of data. As such, one whould always apply a 'sense check' to data resulting from this 110 clustering method.

111
Another popular clustering algorithm is that of fuzzy c-means (Bezdek, 1980(Bezdek, , 1980(Bezdek, , 1975Pham 112 and Prince, 1999) which is an example of 'soft' clustering methods, where probabilities of group 113 allocation are given. Again, this is popular for the automated segmentation of MRI data (e.g.  We wish to address the following specific questions:    adhering to it which obscure some more detailed aspects of its morphology.

223
6. An animal mummy, Manchester Museum number 6033. This is thought to be a shrew, 224 based upon size of the wrappings and earlier medical X-Rays (Adams, 2015).

225
Full scan parameters are shown in Table 1 and volume renders of the tested datasets are shown in These datasets were processed using a series of competing algorithms in ImageJ:

231
• Trainable Weka Segmentation The datasets were also processed using localised fuzzy c-means segmentation, with pre-selection 236 through k-means clustering using the Debian Linux package MIA -tools (Dunmore et al., 2018). 237 We also provide a full step-by-step guide on using the Weka segmentation for MicroCT  between efficiency and efficacy starts to plateau after ~250 trees (Probst and Boulesteix, 2018).

257
All images were then segmented using the appropriate training dataset.

258
All stacks were processed on of two machines with 32GB RAM, PCIeM2 SSD and either a 6 259 core i7 at 3.6GHz (4.2GHz at boost) or an 8 core AMD 2700 at 3.2 GHz (4Ghz at boost). Due to 260 the way the Java virtual machine is configured, graphic card parameters are not currently 261 relevant for this workflow. and intra and inter observer variation for both image stacks are reported.

281
Statistical comparisons 282 The effects of the varying segmentation algorithms on real world results is the most important 283 consideration, as it is sensible to anticipate that users will be most concerned about the accuracy 284 of these. Given that many of the errors in segmentation were related to the artificial noise 285 introduced into the dataset and are the type of noise that would be removed from a 3d model by a 286 user, it was decided that automated measurement of the object in question was desirable. We  Synthesised dataset 312 The majority of the data segmented relatively easily, but both two-dimensional k-means and local c-313 means struggled with the smaller triangles, where noise was closer to the dimensions of the object of 314 interest (Figure 8). Repeatability and error of this segmentation is shown in table 3. Principal 315 componenets analysis (Figure 9) demonstrated that the two-dimensional c-means and k-means 316 segmentation resulted in large measurement errors due to noise. The most accurate segmentations (in 317 terms of measurement accuracy) were those produced by Weka and three-dimensional local c-means. 318 Interestingly, the most accurate overall segmentation produced by Weka segmentation had the worst 319 precision in terms of linear measurements for this segmentation type (the outlier in figure 10), 320 demonstrating that overall accuracy is probably not an ideal real-world statistic to report on when one is 321 interested in particular features of images. segmentation performed as well as the two-dimensional local c-means segmentation and improved some 331 aspects of fine detail retrieval (figure 11). Intra-observer repeatability was within 5.63 microns, which is 332 lower than voxel size. 333 334 <<FIGURE 11 ABOUT HERE>> 335 <<TABLE 4 ABOUT HERE>> 336 337 Wild type mouse tibia 338 The Weka segmentation performed better than most of the other types of segmentation, with improved 339 quality on fine features ( Figure 12). The three-dimensional local c-means segmentation generated the 340 smallest external cortex of the bone (so overall the object was subtly smaller) (Figures 13 and 14 Violin plots indicate that alternative segmentation methods have subtly different distributions in terms of 347 distance from the Weka segmentation ( Figure 15). All suffer from arbitrary spiking in distribution of 348 deviation of the data relative to the Weka segmentation. 349

<<FIGURE 15 ABOUT HERE>> 350 351
It is also apparent that differing segmentation techniques have a marked effect on the degree of anisotropy 352 detected in trabecular bone, with Weka tending towards more anisotropic structures. This may be because 353 of the lack of spiking in the resulting segmentation when compared with the other methods here. The 354 Ellipsoid Factor (a replacement for the Structure model index (Doube, 2015;Salmon et al., 2015) also 355 varies considerably, with a difference of almost 4% between Weka and watershed segmentation (table 5).

356
It is noticeable also that Weka segmentation classifies a relatively low percentage of bone and also 357 trabecular thickness. Three-dimensional local c-means was very dependent on the grid size employed and 358 tended to be very conservative at the grid size chosen (7) in terms of pixels classified as bone. 359 360 <<TABLE 5 ABOUT HERE>> 361 362 Lemur larynx 363 The Weka segmentation was able to account for the ring artefacts in the scan and successfully segmented 364 the materials of interest. It was also more successful at segmenting the finer structures in the larynx (see 365 Figure 16 and 17). It also generated much cleaner data than all other segmentations. The Weka segmentation was able to track trabecular structure successfully, without eroding the material.

374
It also was able to take into account the slight 'halo' effect on the bone/air interface, which conventional 375 segmentation used to create an external border of the matrix material. The c-means and k-means 376 segmentation both created this 'halo' like border (Figures 18 and 19). The watershed segmentation 377 performed very well in this instance as the scan was relatively clean with good contrast between the 378 different materials. The three-dimensional local c-means segmentation was unable to compensate for the 379 ring artefacts in the scan and classified these with the fossil matrix. It also had the 'halo' seen in two 380 dimensional c-means. Animal mummy 396 The Weka segmentation was able to detect the majority of the skeletal features and also discriminate the 397 mummified tissues from the outer wrappings. There were a few artefacts around the front paws and 398 mandible which would require some manual correction to fully delineate the structures. The smoothing 399 steps introduced in the segmentation were able to remove many of the scanning artefacts which made 400 borders of materials harder to resolve with conventional methods. Two dimensional K-means 401 segmentation was able to discriminate materials relatively well also, although it did misclassify several 402 slices. Firstly, we address the specific questions raised in our introduction.   The clustering results when simple objects that had only two material types yielded the most 446 satisfactory segmentations. The 'halo effect' noted in two-dimensional k-means and both 447 variants of local c-means clustering is however an issue when trying to classify a range of 448 materials within one dataset (such as muscle, fat, and bone in a scan of a wet tissue sample) as 449 this may be of equal interest to researchers. We surmise that the reason for this artefact is linked 450 to the partial volume averaging effect at material boundaries (Goodenough et al., 1981). 451 Essentially, at the boundary, the probability of the voxel belonging to either material is equal and 452 therefore it assigns a null material as it has failed to classify the data 453 The repeatability of the Weka segmentation between observers is also within acceptable margins 454 (<0.2% difference between 5 observers) and intra-observer repeatability over a single complex 455 sample is also acceptable. In the case of the synthetic data, the thickness results obtained were in an order of magnitude 478 more accurate than two-dimensional ones. For the wire artefact, three-dimensional local c-means 479 produced by far the most accurate segmentation, but three-dimensional k-means did not appear 480 to change much. For other datasets, three-dimensional approaches were much more effective as 481 they did not suffer from 'flipping' of label order, thereby reducing post-processing significantly.

482
Overall, three-dimensional clustering approaches appear to be more effective and as such, are 483 recommended over two-dimensional ones. Weka segmentation shows a good repeatability both between users and within users. Inter-487 observer error on the synthetic dataset was less than 0.2%. Intra-observer repeatability over a 488 single complex sample is also acceptable, being less than voxel size.

489
More generally, the Weka algorithms can be applied to a wide range of image types, as it was 490 originally developed for microscopy (Arganda-Carreras et al., 2017). We have tested this on 491 image datasets from 8 to 32-bit depth. Given workflow constraints, most images used in 492 everyday analysis will be either 8 or 16-bit. DICOM data requires the conversion to TIFF or 493 other standard image formats to processing with Weka. ImageJ/Weka segmentation is multi-494 platform and has a user-friendly GUI. This make it an ideal toolbox to teach researchers (who 495 may be unfamiliar with the subtleties of image processing) a fast and free way to process their 496 CT data. Key parameters to observe are to use a fast CPU with multiple cores, which will enable 497 users to fully leverage multi-threading; as well as the use of fast hard drives (preferably solid-498 state drives) if working on a desktop. Training the segmentation using fine structures will also 499 improve delineation of edge features. Finally, the use of a graphics tablet is also recommended.  Two-dimensional localised c-means segmentation in both ImageJ and Matlab also tend to flip the 511 order of labels in some images, which then necessitates further steps of interleaving different 512 stacks to obtain one segmentation. This is probably straightforward to address by a forcing of 513 order of labels in the algorithm but is beyond the scope of this paper. Our suggestion therefore is 514 if k-means or c-means clustering is used for segmentation, a three-dimensional approach should 515 be taken.

530
The implementation of Weka segmentation is fast, with no software cost to the end user and it 531 enables an easy introduction to both image segmentation and machine learning for the 532 inexperienced user. Future work will seek to apply this algorithm to larger and more varied 533 samples and increasing the speed of computation, either through GPU based acceleration or use 534 of virtual clusters.

536
Acknowledgements 537 We would like to thank the curators of the collections from which scan material was used for this