Crowdsourced geometric morphometrics enable rapid large-scale collection and analysis of phenotypic data

Advances in genomics and informatics have enabled the production of large phylogenetic trees. However, the ability to collect large phenotypic datasets has not kept pace. Here, we present a method to quickly and accurately gather morphometric data using crowdsourced image-based landmarking. We find that crowdsourced workers perform similarly to experienced morphologists on the same digitization tasks. We also demonstrate the speed and accuracy of our method on seven families of ray-finned fishes (Actinopterygii). Crowdsourcing will enable the collection of morphological data across vast radiations of organisms, and can facilitate richer inference on the macroevolutionary processes that shape phenotypic diversity across the tree of life.


20
Integrating phenotypic data, such as anatomy, behavior, physiology, and other traits, with Collecting landmark-based geometric morphometric data at scale permits detailed analysis 96 of different sources of error, such as among-and within-observer variation (Von Cramon-97 Taubadel et al. 2007). To assess whether the quality of data gathered by workers recruited 98 through Amazon Mechanical Turk was significantly different than traditionally-collected data, 99 we asked turkers (n = 21) and experts (n = 8) to landmark a set of five fish images, five 100 times each. All participants used the same protocol and same software to digitize the same 101 set of fishes. The landmarks were carefully selected based on previously-published literature 102 concerning fish shape (Supplemental Figure S2; Fink & Zelditch 1995;Cavalcanti et al. 1999; rate varies depends on application, but here we use a 25% misprediction rate as a standard 141 for sufficient accuracy. This is a highly forgiving standard, since a 50% misprediction rate is 142 no better than a coin flip, and a 25% misprediction rate would still erroneously classify one 143 in four turkers as experts or vice versa. We also use quadratic discriminant analysis (QDA), 144 which relaxes some of the assumptions of LDA, and similarly report the QDA misclassification 145 rate. 146 We calculated the per-individual median shape for each species used, as well as the consensus 147 turker and morphologist shapes, and projected these shapes into Procrustes space, to visualize 148 the orthogonalized differences in median shape among and between the types of digitizers.

149
Example: a phenomic pipeline for comparative phylogenetic analysis 150 A common strategy in fish comparative studies is to examine evolutionary dynamics within a 151 single family (Ferry-Graham et al. 2001;Alfaro et al. 2005Alfaro et al. , 2007Rocha et al. 2008;Hernandez 152 et al. 2009;Dornburg et al. 2011;Frédérich et al. 2013;Santini et al. 2013;Sorenson et al. 153 2013; Claverie & Wainwright 2014; Thacker 2014), potentially due to the extensive amount 154 of time necessary to collect data. To test whether our method can improve on the case where 155 the data collection method is geometric morphometrics, we use the average time it took an 156 expert to measure a single fish image and predict the time it would take for a single individual To demonstrate the utility of obtaining comparative data using this method, we use previously 162 published phylogenies for seven fish families: Acanthuridae (Sorenson et al. 2013 and used these principal components axes for subsequent analyses. 174 We used Bayesian Analysis of Macroevolutionary Mixtures (BAMM; Rabosky 2014) to esti-175 mate rates of speciation and body shape evolution for all seven families. For the characters 176 describing body shape, we use the PC axes whose eigenvalues exceeded the corresponding ran- (without data) to exclude rate heterogeneity that occurred solely due to stochastic processes.

183
For nearly all landmarks, turkers only differ from the expert consensus by a few tens of pixels  reliably distinguish between these two groups, for any given family. Although for some images 206 the classifier showed slight improvement beyond a 50% coin flip, in all cases our model fell 207 short based on a one in four (25%) acceptable misclassification rate. We conclude that, for 208 any given sample of landmarks, it is challenging to statistically distinguish between expert-209 provided and turker-provided landmark configurations. 210 We projected turker and expert shape configurations into morphospace ( Figure 2, Supplemen-211 tal Figure S4) Although the overall space occupied by each family's shape configurations vary, 212 in practice, the aggregated median turker and expert shapes are not qualitatively different.

213
The only exception is the triggerfishes (Balistidae), likely due to turker confusion over the 214 exact location of dorsal fin due to their reduced anterior dorsal fin.  Figure S5). The BAMMtools analysis uncovered substantial amounts of het-223 erogeneity in the rate of body shape evolution and speciation in each family ( Figure 5).

224
Significant shifts in the rate of shape evolution or speciation were detected in three families: One advantage of the crowdsourced method we develop here is that inter-observer error can be 255 readily assessed. Traditional geometric morphometric studies often rely on a single observer 256 for practical reasons (the pool of trained geometric morphometricians is limited), and to 257 avoid individually-driven systematic biases in data collection. Although this common practice 258 may reduce bias, it also precludes meaningful assessment of differences among observers.

259
Our results show that inter-observer variance can be substantial for some landmarks even inter-observer error is not ignored or bypassed due to the difficulty of assessing it.

267
In our analysis, we assessed the quality of a variety of landmarks between turkers and ex-  Our novel pipeline to download images, upload them to Amazon MTurk, and process them 333 using BAMM and BAMMtools showcases the ability to rapidly collect phenotypic data. Most 334 of the time taken to collect these data were spent on waiting for worker results; however, a 335 majority of the data had already been collected at the 1-hour mark. An online methodology 336 could conceivably improve on this analysis time, by iteratively refining its results as new data 337 streamed in from Amazon's servers.

338
Although there are limitations in the type and accuracy of data that can be collected through Accuracy: median turker − median expert Precision: log turker variance / expert variance Figure 1: Per-family breakdown of accuracy vs. precision for each landmark. Accuracy is represented as the difference between the median turker location for that landmark and the median expert location, with the expert location assumed to be the true location. Precision is represented as the log-ratio of median absolute deviations between turkers and experts. More positive numbers indicate better expert precision, whereas more negative numbers indicate better turker precision. Points highlighted in red are those determined to be outliers Time taken (in minutes) to receive data for one unique replicate Fraction of image set complete