Automated identification and registration of anatomical landmarks in C. elegans

The physiology of the nematode C. elegans can be visualized with many microscopy techniques. However, quantitative microscopy of C. elegans is complicated by the flexible and deformable nature of the nematode. These differences in posture and shape must be addressed in some fashion in any automated or manual analysis. Manual approaches are time intensive and require hand-labeling anatomical regions of interest. Automated tools exist, but generally rely on high-magnification imaging using labeled nuclei as fiducial markers. Here we describe a suite of new tools that allows for high-throughput analysis of whole-body images, aligned using anatomical landmarks identified from brightfield images. We show how these tools can be used in basic morphometric tasks and examine anatomical variation and morphological changes in a population over time.

variations in internal anatomy across individuals. This allows the straightening and internal 86 alignment of any set of imaging data as long as corresponding brightfield images are available. 87 Finally, we demonstrate how this approach can be used to quantify population trends in 88 anatomical variation in a dataset comprising thousands of images of a 118 individual worms 89 observed from young adulthood until death. 90

91
Accounting for positional variation 92 To compare image data across individuals and temporally, it is first necessary to account 93 for the effects of the animal's orientation and posture within each image. This requires 94 identifying regions in separate images that correspond to the anatomical structures of interest. 95 Manual efforts such as circling the regions corresponding to a specific neuron across a set of 96 fluorescent images, for example, are a special case of this general task. A systematic 97 computational approach to the problem is to define a worm-local coordinate system, such that 98 each image pixel within the animal can be labeled with two coordinates: the pixel's position 99 nose-to-tail along the animal's anterior-posterior (AP) axis, and in the case of 2D imaging of 100 animals crawling on solid media, its position along the dorsal-ventral (DV) axis, which is locally 101 orthogonal to the AP axis ( Fig 1B). Locating an animal's centerline (see Methods for details) is 102 sufficient for this task, as the centerline defines the AP axis and the vectors normal to every 103 point along the centerline define the DV axis. Once these axes are determined, it is simple to 104 generate a straightened "worm frame of reference" image by sampling image intensities along 105 a grid of anterior-posterior and dorsal-ventral positions ( Fig 1B). In this way, whole-animal 106 images can be directly aligned and compared. Alternately, it is also possible to measure 107 localized image intensity patterns without any image transformation by simply defining sub-108 regions of interest in the AP/DV coordinate system. These "worm-frame" coordinates can be 109 transformed back into pixel regions in any particular "lab-frame" image, allowing raw, 110 untransformed pixel values from those regions to be compared directly ( Fig 1C).  Straightened "Worm-frame" images remain valid for quantitative comparison 117 At its core, transforming an image from the lab frame into the worm frame is a sampling 118 problem. Computationally straightening the curved worm body involves mapping points from 119 one coordinate system (the lab frame of reference) to another (the worm frame of reference). 120 A potential problem arises, however, when mapping points along a highly curved shape like 121 that of a coiled animal. Because the arc length differs between the inner and outer edges of a 122 curved shape, sampling image intensities at single-pixel spacing along the centerline produces a 123 relative undersampling of pixel intensities along the outer edge and a relative oversampling of 124 intensities along the inner (Fig 2). This results in image intensities along the outer edges being 125 under-represented in worm-frame images, and those along the inner edges being duplicated. 126 An example is depicted in Fig 2A. 127 Such issues are often waived away by noting that computationally straightening curved 128 surfaces is an affine transformation, and is thus fully invertible (9,16). And indeed, it is equally 129 possible to map from the lab to worm frame and vice versa. This latter property is especially 130 useful for making tissue-to-tissue correspondences between two different worm images. Since 131 the origin of every pixel in the worm frame of reference is known, point-correspondences can 132 be made between any lab-frame image and another image from the worm frame of reference. 133 However, an invertible transformation does not mean that it is area-preserving, and as shown 134 in Fig 2A. Particularly, the inner edges of curves are expanded during straightening and the 135 outer edges are shrunken. This is of particular consequence for quantitative fluorescence 136 imaging: there is no guarantee that total fluorescence intensity in highly curved sections of an 137 animal will be conserved during the lab-frame to worm-frame transformation. We thus found it 138 important to evaluate whether the size of this effect is a relevant practical problem. 139 We quantified the effects of computationally straightening C. elegans images using a 140 dataset of several thousand (n=6527) fluorescence images of adult Plin-4::GFP;spe-9(hc88) 141 animals (See Methods). We measured the differences in area, total pixel intensity, and the 142 distribution of pixel intensities between fluorescence images before and after straightening (i.e. 143 in the lab and worm frames of reference). In all cases these statistics were calculated only over 144 pixels within the region corresponding to an individual C. elegans, ignoring background pixels.

145
To understand the scale at which such differences would become biologically meaningful we 146 compared these statistics to those calculated between unstraightened, lab-frame images of the 147 same individuals taken three hours later. We reason that worm size and GFP expression does 148 not vary greatly in adult Plin4::GFP animals over a three-hour interval. Thus, measured 149 differences between such images would be due to variability in the imaging process or changes 150 in the animal's position/posture, and thus represent a level of variation that is safe to consider 151 as "biologically negligible".

152
Therefore, we can conclude that the errors produced by the straightening procedure are 153 sufficiently small as to be biologically negligible if they are smaller than those observed 154 between consecutive lab-frame images taken at three-hour intervals. To define a scale at which 155 image differences are definitively non-negligible, we also compared lab-frame images of 156 random pairs of different, non-age-matched individuals.

157
To measure the total amount of area and pixel intensity altered by the straightening 158 procedure, it was important to ensure that worm area/pixel intensity created on the inside of 159 each bend was not nullified by area/intensity removed on the outside of the same or other 160 bends. Even if net image intensity or area remains unchanged, we wanted to examine the 161 extent to which image measurements from any particular anatomical region might be distorted 162 by straightening. Therefore, we split the worm regions into a "checkerboard" pattern to ensure that the inside and outside edges of each bend would be compared separately (Fig 2B). The size 164 of the checkerboard was chosen to be smaller than the size of typical worm bends. The 165 differences in area shown in Fig 2C are the sum of the absolute value of the difference in area 166 of each matching checkerboard section between a worm-frame and lab-frame image, or two 167 different lab-frame images. Similarly, the difference in total fluorescence intensity is the sum of 168 the absolute differences in total intensity between matching sections.

169
Image regions with the same total fluorescence intensity do not necessarily have 170 identical pixel patterns. For example, image warping might lead to the loss of a single very 171 bright pixel and the duplication of several dimmer pixels, leading to an identical total pixel 172 intensity between the warped and un-warped images. These images, however, would have very 173 different distributions of pixel intensities. We therefore also compared how similar or dissimilar 174 the distribution of intensities were across pairs of images, by first computing pixel intensity 175 histograms for each image and then calculating the Earth Mover Distance (EMD, also known as 176 Wasserstein Distance) metric to define the similarity of those histograms. This metric has been 177 used to measure image histogram similarity for color-based image retrieval, texture metrics, 178 and shape matching (27-33). The EMD calculates the amount of "transportation work" needed 179 to transform one histogram into another, effectively penalizing differences in nearby histogram 180 bins less than differences that require moving histogram "mass" between more distant bins 181 (34,35) ( Fig S1). For this measure, the total reported EMD between a pair of worm images is the 182 sum of each of the EMDs between matching checkerboard sections.

183
Using longitudinal brightfield and fluorescent image timeseries acquired using in our 184 Worm Corral system, we selected images from 3-7 days post hatch (corresponding to young through mid-adulthood), resulting in a dataset of 6527 individual C. elegans images (36,37) (See 186 Methods). We defined three sets of image pairs to compare: lab-frame vs. worm-frame images; 187 consecutive lab-frame images of the same individual; and random pairs of lab-frame images 188 from different individuals. The worm-frame images were straightened as described above and 189 the above measures of differences among image pairs were calculated. Overall the differences 190 induced by the worm-straightening procedure were dramatically smaller than those between  independently at each point along the AP axis to produce worm-frame images with a 207 standardized nose to tail "width profile". Lab-frame images from individuals of variable shapes 208 and sizes can thus be standardized to enable direct comparison (Fig 3). 209 However, shape standardization alone is not sufficient to account for individual  Moreover, better internal coordinates can also improve the ability to make correspondences 244 between different lab-frame images: for example, being able to define a region to compare 245 across every lab-frame image as "the portion of each animal from the tip of its nose to the 246 center of the anterior pharyngeal bulb" is more precise and biologically meaningful than   279 We evaluated the accuracy of our landmark-prediction CNNs by measuring the absolute 280 distance (in pixels) between the predicted AP coordinate and the manually annotated ground-281 truth value, using a test set of images that were not used at any point in training the CNNs. We  We then examined how our automated system performed with respect to these best-289 and worst-case scenarios, across several different model parameters. For the gaussian keypoint 290 maps, we examined different variance parameters, resulting in larger or smaller hotspots (Fig   291   4C). For sigmoid coordinates we varied the slope parameter, tailoring the sharpness of the 292 transition between positive and negative values ( Fig 4C). 293 We found that both 1D gaussian keypoint maps and sigmoid coordinates are robust, 294 achieving accuracy as well as or better than human raters across a wide range of parameter 295 values ( Fig 5). As a difference of one pixel translates to 1.3 microns in our optical system, even 296 the worst single error in the worst-performing model was less that 120 microns (or 297 approximately one tenth of the length of an average adult animal) from its true position (Table   298 S2). Based on these results we conclude that the sigmoid coordinate and 1D gaussian keypoint 299 maps are sufficient representations of anatomical landmark location. The sigmoid coordinate 300 map with a slope of 0.5 performed the best overall, with the lowest total error averaged across 301 all landmarks (Table S2). We therefore used this scheme throughout the subsequent 302 experiments and analyses.

303
Although the keypoint map representations were able to successfully be used to predict 304 the four anatomical landmarks, some landmarks were easier for the model to identify than 305 others. The two pharyngeal bulbs were the easiest anatomical landmarks for the model to 306 locate while the vulva was the hardest. Visual inspection of the worst-localized landmark shows 307 that the pipeline failed mostly on images where even human raters had difficulty identifying 308 specific structures (Fig S2).

310
Anatomical variation across individuals and over time 311 We next used these tools to examine how, at both the individual and population levels,  (Fig 6B, right). The two pharyngeal bulbs are located at around 11% and 19% of the total length, 329 respectively, while the average vulval position is at 50% of the worm length. The tail location 330 (which as above does not refer to a specific anatomical structure but rather the last point of 331 optical density visible at 10× magnification in our system) is also consistently around 94% of the 332 worm length (Fig 6B right, Table S4). Using units of either absolute pixels or of relative  (Table S3   335 vs Table S4). This similarity results from both the low variability in length across the population  were determined by coupled processes, then larger-than-average worms would typically have 353 both pharynges and gonads/intestines that were larger than average. Alternatively, if the sizes 354 of these compartments were determined by independent processes, above-average body 355 length could be driven by above-average pharyngeal size, above-average intestinal and/or 356 gonad size, or both (at a correspondingly lower rate). To address this question, we examined 357 whether pharyngeal size was correlated with the size of the rest of the body, mostly 358 determined by the intestinal and gonad size (Fig 6C, top). The correlation between pharyngeal 359 and remaining body length was modest at best (Fig 6C, left), suggesting that there is only a 360 weak coupling between the developmental programs that determine pharynx size and those 361 that determine the sizes of the intestine, gonad, and other organs. 362 We also examined the related question of where the vulva is placed within the 363 intestine/gonad. The intestine/gonad compartment is defined as the region between the 364 posterior pharyngeal bulb and the tail landmarks (Fig 6C, top). We examined vulval placement correlation between these lengths (Fig 6C, right). Overall, these results suggest that there is 370 neither tight developmental coupling between the relative scaling of the pharynx compared to 371 the remaining body (Fig 6C, left) nor the position of the vulva with respect to the 372 intestinal/gonad compartment (Fig 6C, right).

373
A subset of individuals died or came to within 1 day of death during the timeline of our 374 dataset (2-6 days post-hatch). Near-death changes in body volume have been observed 375 previously, with a sharp decrease in body size about a day prior to death followed by a recovery 376 of pre-death size after death (3,44). We observed this decrease in body length among the short-377 lived subset of individuals, beginning approximately two days prior to death (Fig 6D, left). We 378 next examined whether this shrinkage occurs only in one or another set of tissues and/or 379 organs. The largest shrinkage in absolute terms occurs in the intestinal/gonad compartment 380 (Fig 6D, left); however, that also represents the largest compartment in the worm body. By  An important step in developing machine-learning driven image analysis tools is to 399 determine the best data representation on which to train. We found empirically that regression 400 techniques to transform an input image directly into a list of (x, y) coordinates for each image 401 landmark, which have been successfully applied on images of human faces and other similar 402 tasks (45,46), did not function well in this setting. Instead, we found that image-to-image regression where the output was not the coordinates of the location of the landmark, but 404 instead an image that implicitly and robustly encoded that location, was more successful. 405 Ultimately, this approach was able to predict anatomical landmarks with the same accuracy as 406 experienced human annotators, but with substantially reduced time and effort.

407
In this work, we demonstrated how automatically identified landmarks can be used for 408 basic morphometric tasks, examining growth, shrinkage, and organ/tissue scaling. However,      To produce worm-frame images with a standardized length and width profile that 465 matches that of the "average" worm, we sampled the lab-frame image with a different number 466 of steps along the centerline to "stretch" or "shrink" a given individual to a longer or shorter   The criteria for annotating the anatomical landmarks were as follows: slightly past the anus, which is not always directly visible in our images. 498 We randomly grouped these annotated images into "train", "validation", and "test" 499 datasets (4515 "train" images, 1322 "validation" images, and 690 "test" images). These 500 datasets were used to train, evaluate, and benchmark the neural networks, respectively. During 501 the training process, the validation dataset was used to ensure the neural networks were not 502 overfitting to the training data. The test dataset, however, was never used during the training 503 process and was only used to benchmark the neural network performance. We ensured that all 504 images from the same individual were grouped into the same datasets, to prevent overfitting to 505 the idiosyncrasies of particular individuals.   on the "test" images (data not shown). Note that the model was neither trained on any images 529 from the "test" dataset, nor were those images used to select among different models.

530
For the anatomical landmark prediction models we adapted the "Coordinate   hotspot map. After this transformation, we identified the AP coordinate as described above.

585
To convert the AP coordinates from the standard worm (960 pixels long, with a 586 standardized width profile) to those of any specific individual, we multiply by where worm length is calculated as above.

588
Inter-rater agreement 589 As a benchmark, we compared the distance between the human-annotated ground-590 truth landmark locations and either the locations generated by our neural network models or 591 those annotated by other human raters. We asked 3 raters well versed in C. elegans anatomy to 592 identify the four anatomical landmarks in 200 worm-frame images sampled from the full 593 dataset. As worms age their organs become more visibly disordered, making it increasingly 594 difficult to identify the anatomical landmarks. The 200 images were thus equally sampled from 595 10 age bins ranging from young to old adulthood to ensure images from all ages were 596 represented.

597
To account for systemic bias in each human rater's annotations (e.g. perhaps one rater 598 systematically annotates the pharynges as ten pixels anterior of where another would), we first 599 calculated a bias term, defined as the average signed distance between a rater's annotations   respectively. Both parameters, illustrated here, control how "sharply" vs. "fuzzily" the keypoint 694 map encodes the landmark location.