No evidence for motion dazzle in an evolutionary citizen science game

The motion dazzle hypothesis posits that high contrast geometric patterns can cause difficulties in tracking a moving target, and has been argued to explain the patterning of animals such as zebras. Research to date has only tested a small number of patterns, offering equivocal support for the hypothesis. Here, we take a genetic programming approach to allow patterns to evolve based on their fitness (time taken to capture) and thus find the optimal strategy for providing protection when moving. Our ‘Dazzle Bug’ citizen science game tested over 1.5 million targets in a touch screen game at a popular visitor attraction. Surprisingly, we found that targets lost pattern elements during evolution and became closely background matching. Modelling results suggested that targets with lower motion energy were harder to catch. Our results indicate that low contrast, featureless targets offer the greatest protection against capture when in motion, challenging the motion dazzle hypothesis.


Introduction
and direction [15] perception. There is also evidence that some orientations of stripes can interfere 48 with the ability to track one target within a larger group [16][17][18]. Finally, modelling work has 49 suggested that striped patterns may be particularly prone to creating erroneous motion signals in 50 the visual system, which may underlie these types of behavioural effects [19]. 51 Despite these findings, not all research has supported the motion dazzle hypothesis. Some studies 52 on humans have found that striped targets are easier to capture than non-patterned targets [20,21], 53 and moving cuttlefish have been shown to preferentially display low contrast patterns [22]. Similarly, 54 a recent study using natural predators hunting patterned prey found no evidence for a benefit of 55 motion dazzle patterning compared to uniform coloration [23]. Even studies which have argued for 56 an effect of motion dazzle patterning have normally shown that there is no benefit in terms of 57 capture success of striped patterning over a luminance matched non-patterned target, suggesting 58 that the benefit of stripes may not be unique [9,11,14,21]. 59 One limitation of previous studies is that they have tested a relatively small range of patterns, often 60 chosen arbitrarily. This means that it is not yet clear whether we have truly discovered the optimal 61 patterning type to provide protection when in motion; it may be that there are more effective 62 options than those tested so far. However, the small-scale psychophysics-style experiments used to 63 date make it difficult to test large numbers of patterns. We therefore took a novel approach, using 64 genetic programming to allow the patterning of targets to 'evolve' across generations in response to 65 capture success [24][25][26]. In this way, we can ask which patterning strategy is optimal, given the 66 almost infinite number of possible patterns that can be generated. To obtain the large amount of 67 data required for this approach, we ran our experiment as a citizen science game ('Dazzle Bug') in a 68 popular visitor attraction. Participants played the game by tapping on the moving targets ('bugs') 69 with their finger as quickly as possible in order to 'catch' them ( Figure 1). We ran a number of 70 replicates of the evolutionary process for three populations of different speeds, to assess whether 71 the optimal patterning changes as a function of the target movement speed. 72

75
Our first aim was to demonstrate a fitness increase in our experimental populations, which we 76 defined as an increase in the average capture time across generations. We did this by comparing to a 77 simulation run of the evolutionary algorithm, using randomised capture times. We then investigated 78 how the target patterning changed across generations for different speed populations, using image 79 analysis to measure contrast and the presence of stripes at different orientations. We also looked at 80 whether selection rates differed for the different speed populations, using the Land, Arnold

85
Is there a fitness increase for the experimental populations, and does this differ from the null 86  with the fast bugs being hardest to catch, followed by the medium bugs and then finally the slow 92 bugs (χ 2 = 50892.85, p < 0.001). There was a considerable level of noise in the data, which is to be 93 expected given the wide range of participants and fast reactions required. Nevertheless, there was 94 also a significant increase in fitness across generations (χ 2 = 208.72, p < 0.001). Increases were often 95 particularly obvious in the early generations of the game. 96 97 98 Figure 3: Experimental data (left) and control data (right) compared across 40 generations and for the three different speed 99 populations. Experimental data has been collapsed across all 4 replicates. All raw data points are plotted and the curves are 100 fit using splines with two degrees of freedom.

101
The experimental data also show a significant difference in fitness change compared to the null data 102 (interaction between dataset and second order effect of generation: χ 2 = 161.985, p < 0.001). The 103 experimental data shows a characteristic quadratic shape, with an initial increase that flattens off 104 ( Figure 3). We therefore have evidence for a fitness increase in our experimental population, 105 suggesting that selection is occurring to optimise patterning types.

131
The data allow us to determine the main selection pressures operating on each population of bugs 132 within each generation (normalised linear selection rates (β)), so that we can assess whether 133 pressures change over evolutionary time. Differences in selection rates across generations were 134 seen for luminance (χ 2 = 12.815, p = 0.002), vertical stripes (χ 2 = 11.593, p = 0.003) and for diagonal 135 stripes (χ 2 = 6.647, p = 0.036). There was no evidence for difference in selection rates for both the 136 horizontal stripe (χ 2 = 1.705, p = 0.426) and the right edge metrics (χ 2 = 5.486, p = 0.064). 137 The standard deviation of the luminance of the bugs appears to be particularly important for the 138 'fast' population; there is strong selection pressure particularly in early generations, and this differs 139 from the selection rate seen in the 'medium' and 'slow' populations ( Figure 6 predictions [19]. In addition, the targets with the highest bias also tended to be relatively stripy and 154 high contrast (bugs with higher bias had both higher standard deviations of luminance F = 10.844, p 155 =0.001, and levels of vertical stripes F = 35.688, p < 0.001) again suggesting that these "motion 156 dazzle" type patterns might be expected to create illusory motion signals.

161
However, these results do not seem to explain our evolutionary findings, where we saw a strong 162 tendency for targets to become lower contrast and non-patterned. A second metric from our motion 163 modelling is the motion energy, which can be conceptualised as how salient or visible the motion is. 164 Here, there is a very different relationship with fitness, as can be seen in Figure 8, with low motion 165 energy targets (that tend to be low contrast and have little patterning) having higher fitness than 166 those with higher motion energy (that tend to have high contrast and strong patterning) (F = 4.391, 167 p = 0.027; F = 4.989, p = 0.026 if data were not filtered to exclude cases with a circular mean 168 difference of greater than 6 degrees). Bugs with higher mean vector lengths had both higher 169

176
Using a large-scale evolutionary citizen science game, we found no evidence that putative 'motion 177 dazzle' patterning can offer protection when in motion; despite predictions that high contrast, 178 geometric patterning should cause visual illusions that make targets harder to catch, we found that 179 the targets consistently evolved to become less patterned and lower contrast. This happened for all 180 speeds tested and all replicates of the experiment, although these changes seemed to occur more 181 quickly in populations with faster speeds. Motion modelling suggested that these results could be a 182 consequence of the motion energy of the stimulus, as this correlated with capture time, with lower 183 motion energy targets being more difficult to catch. Our results have important consequences for 184 our understanding of the evolution of stripes, and for how animals should best protect themselves 185 from capture when in motion. 186 Our results are perhaps surprising in the context of most literature on motion dazzle to date, which 187 has suggested that stripes seem to be relatively difficult to catch or can cause illusions of speed or 188 direction perception [9-12,14-18]. However, we note that there has indeed been plenty of evidence 189 in the literature for uniform grey patterns also being relatively difficult to catch, and in some cases 190 perhaps even harder than striped targets. For example, grey targets always survive well in capture 191 studies [9,11,14,21]. Similarly, in tracking tasks, low contrast parallel stripes were found to be more 192 difficult to track than high contrast parallel stripes [18], arguing against a motion dazzle explanation. 193 Recent work has also suggested that in some cases striped patterns are only difficult to catch when 194 the targets are moving sufficiently quickly to blend via the "flicker-fusion" effect into uniform grey 195 [43]. Our results therefore suggest that uniform grey targets had a survival advantage over other 196 types of target patterning, leading them to become fixed as the optimal strategy in all our 197 populations, regardless of speed or replicate number. 198 Motion modelling has previously suggested that stripes should create erroneous motion signals that 199 are both highly coherent and biased [19], implying striped prey should be more difficult to catch. 200 However, to our knowledge, modelling results have not previously been compared to behavioural 201 data. Our large dataset therefore offers a perfect opportunity to study whether the motion 202 modelling results do indeed correlate with capture times. In support of the motion dazzle hypothesis 203 [19], we do indeed find that highly coherent and biased targets tend to be more difficult to catch 204 than less biased coherent targets, and that the most biased and coherent targets are often stripy. 205 However, this clearly does not explain the results we see in the evolutionary game. We thus 206 considered another metric that can be calculated from motion models, namely the motion energy, 207 and found that this also correlated with capture success. Targets with low motion energy (that 208 tended to be uniform grey) were harder to catch than targets with high motion energy (that were 209 much more high contrast and patterned). 210 Why does background-matching (reducing motion energy) seem to be a better predictor of the 211 outcomes in our evolutionary games compared to motion dazzle strategies which maximise the 212 bias/coherence metric? We speculate that motion energy is a very consistent signal; regardless of 213 the trajectory of the bug or the speed, the targets with low visibility will be harder to catch than 214 those that are highly visible. We propose that the effects of stripes may be much more dependent 215 on the particular orientation of the stripes, given that the most effective striped targets appeared to We used three different speed populations in order to assess whether there were differences in the 224 patterns that evolved. As expected, we found that there were strong differences in capture difficulty 225 for different speed populations, with fast targets being the hardest to capture, but we did not find 226 evidence for there being differences in the target patterning that evolved, with all populations 227 becoming uniform grey. This is in agreement with previous work suggesting that there is no 228 interaction between target speed and prey patterning [11], at least for speeds below that needed to 229 create a "flicker-fusion" effect. However, we did find increased selection in "fast" populations, 230 particularly early on in the evolutionary process for the contrast metric and later on for the vertical 231 stripe metric. This may simply reflect the higher difficulty of these targets, which is likely to give a 232 wider range of capture times and thus offer more variation for selection to operate on, potentially 233 exaggerating the selection process. Overall, we find limited evidence for motion dazzle effects in a citizen science evolutionary game, 257 which we believe is the most comprehensive test of this hypothesis to date. Stripes were able to 258 cause motion illusions and reduce capture times in some scenarios, meaning that there may still be 259 specific cases where motion dazzle can be at least part of an explanation for the evolution of striped 260 patterns. However, our results suggest that uniform grey targets appear to be a more stable optimal 261 solution. 262

Subjects 264
We did not collect any demographic data from participants. This was to streamline participation in 265 the study (which was conducted in a busy exhibition space) and also because it would be difficult to 266 verify the accuracy of the information presented. To overcome the limitations of being unable to 267 account for participant age, handedness and gender, we collected a large sample size of participants The game had a similar format to many previous studies testing motion dazzle effects [10,11,21] in 280 that participants were presented with a small rectangular target (75 x 100 pixels, or 29.0 x 38.6mm; 281 visual angle 2.76x3.69) which they had to try to 'catch' as quickly as possible after it had appeared 282 by touching it with their finger (Figure 1). Targets began their movement at a random position on 283 the screen and moved with a linear trajectory. The angle of movement changed throughout a trial, 284 both at the edge of the target arena via reflection (to ensure that the target remained visible to the 285 participant) and randomly throughout movement (once every half a second, and when an 286 unsuccessful capture attempt was made; the new angle was randomly chosen based on its previous 287 angle plus or minus 90 degrees). Targets could be presented at one of three speeds, fast, medium or 288 slow (600, 450 or 300 pixels per second respectively, independent of frame rate, which equated to 289 231.8, 173.8 and 115.9mm/s), and each participant was presented with a random mix of targets of 290 all three speeds. Participants had 5 seconds to catch each target. After the target had been caught, 291 or the time-out limit had been reached, the game would move automatically onto the next target. A 292 game consisted of 20 trials in total, with the targets presented randomly selected from the current 293 generation. 294

Background photos 295
Targets were presented against one of 40 naturalistic background photographs (of e.g. grass, tree 296 bark or leaf litter). The background was randomly selected on each trial. The photos were calibrated 297 and converted to greyscale (with an average pixel value of 127). 298

Pattern generation 299
The patterns throughout the game were generated through a genetic programming approach [24-300 26]. This does not attempt to directly mimic biological evolution, but is instead a method allowing 301 the exploration of an unbounded parameter space in an efficient manner, using algorithms inspired 302 by natural selection processes. The key principle is that the evolutionary process acts to modify small 303 'computer programs' that specify the patterning presented on each target. This allows a great deal 304 of flexibility in the complexity of target patterning and reduces artificial bounds on the evolutionary 305 space that can be introduced in more traditional genetic algorithm methods [25]. 306 Targets were generated in a hierarchical manner, as shown in Figure 9. The 'tree structure' of the 307 program determining the target pattering is composed of two different types of node. One type of 308 node is the 'terminal node' that is found on the outer ring of the tree. There were two possible 309 variants of terminal node (each chosen with a probability of 50%). One variant was a flat image of a 310 specific RGB colour (always greyscale) and alpha (transparency) value. The second variant of 311 terminal node consisted of a specific pre-generated image; there were 66 different initial images 312 from a range of different categories, including striped patterns, spotted patterns and noise patterns, 313 and with a range of spatial scales (see Figure 10). These base images could also be moved using an x-314 offset and a y-offset value (with the patterns wrapping around the target) and rotated (in radians). 315 The other type of node was the 'combination node'. Here, two image inputs were combined using 316 one of the following randomly selected nodes:

334
An example displayed target is shown in the centre of the screen in Figure 9, and was formed by the 335 top combination node of the tree. The input to this combination node could either be other 336 combination nodes (as seen in this example) or could include a terminal node as well (with 20% 337 probability). The process can be followed backwards until the input to a combination node is two 338 terminal nodes (with randomly chosen parameter inputs), ending that part of the 'tree' and forming 339 an outer edge of base images. 340

Evolutionary process 341
Four replicates of the game were run, with each replicate containing three separate populations for 342 each speed (fast, medium and slow) that each evolved separately. The first generation of each 343 population contained 128 individuals that were completely randomly generated in accordance with 344 the pattern generation process detailed above. These were then presented to players randomly until 345 they had all been played five times. At this point, each one was scored by averaging the time taken 346 to catch them, and the bottom half of the generation based on this measure of fitness was removed 347 from the population. (Normalisation of participant times was not possible due to the design of the 348 evolutionary algorithm). The top 64 targets were copied with no mutation to form one half of the 349 new generation, and then copied again with mutation to form the other half. The mutation process 350 involved either random changes of a parameter variable (e.g. changing the RGB colour) or selecting a 351 random part of the tree (either a combination node or a terminal node), copying it and pasting it 352 onto another random part of the tree. Pruning then occurred if the mutation process increased the 353 depth of the tree to beyond the maximum permitted (6 layers). This process could lead to both 354 increases and decreases in target complexity. The mutation rate was randomly selected for each 355 target, with there being a 0-10% chance of a mutation occurring, but with the probability being 356 weighted towards 0% (i.e. no mutation was most likely, but up to a 10% chance was possible). 357 The exact number of generations tested varied between replicates because each participant was 358 randomly assigned to one replicate, and because not all replicates were run simultaneously. 359 Replicate 1 had 89 generations, replicate 2 had 87 generations, replicate 3 had 45 generations and 360 replicate 4 had 46 generations. 361

Control model 362
We ran a control model to confirm that any systematic patterning changes seen during the real 363 game were due to directional selection, rather than drift or biases within the genetic programming 364 algorithm. This was set up identically to the real experiment, except that instead of participants 365 playing the game, the computer randomly selected a 'capture time' for each target in each 366 generation, based on a Gaussian distribution using the mean and standard deviation of each 367 population in the real experiment (as individual clicks were not recorded in our experimental data, 368 we estimated the variance of individual plays by multiplying the variance of the 'bug-level' fitness by 369 the number of plays of each bug e.g. by 5). The null model was run for 40 generations. 370

Quantification and statistical analysis 371
We analysed the patterning of the targets using custom written scripts in ImageJ (version 1.51k) 372 [30]. This script first calculated the mean, minimum and maximum luminance of each target, and the 373 standard deviation of the luminance. We also calculated the contrast of the target as the coefficient 374 of variance in luminance (the standard deviation divided by the mean). We then used Gabor filtering 375 methods that allow measurement of different angles at different spatial frequencies to determine 376 the strength of these signals on the targets in a biologically plausible way [31][32][33]. We analysed four 377 angles (vertical, horizontal, and two diagonal stripes) each at four different spatial frequencies 378 (sigma values of 2, 4, 8 and 18 pixels). For each of these conditions, we calculated the standard 379 deviation of Gabor-convolved pixel values as a measure of the "energy" at that particular angle and 380 spatial frequency. Finally, we also measured the standard deviation of Gabor- included as a second order fixed effect to account for non-independence in capture time between 391 generations, and population (fast, medium or slow) was also included as a fixed effect. Replicate ID 392 was included as a random effect. Model AIC values were compared to determine which metrics best 393 predicted capture times, within different categories: for luminance metrics, this was the standard 394 deviation of the luminance, a sigma value of 4 for vertical stripes, a sigma value of 2 for horizontal 395 stripes, a sigma value of 2 for diagonal stripes (with both diagonal directions pooled together) and 396 for edge metrics, a sigma value of 8 for the right hand edge. In all of these cases, the measure was a 397 highly significant predictor of average fitness (p < 0.001 for all metrics). An example of the model 398 structure used is as follows: 399 lmer(log(Fitness) ~ poly(Generation,2) + Population + scale(SD) + (1|Replicate) 400 First, we modelled whether there was a change in fitness across generations and populations in our 401 experimental data. We fit a model with the log of fitness as the dependent variable, and the second 402 order effect of generation and the first order effect of population as fixed factors. Replicate number 403 was included as a random slope. We then compared the change in fitness of our targets across 404 generations for both the Eden project data and the null data, allowing us to test whether fitness 405 improved in our experimental population compared to a null baseline. To do this, we fit a similar 406 model as previously, but also included a variable indicating whether the data belonged to a null or 407 an experimental population ('control'). The interaction between generation number and the 408 'control' variable was also included as the key interaction determining whether the increase in replicate, we fitted a multiple linear regression between the dependent variable of logged fitness 425 and the five normalised camouflage metrics as independent variables. Normalising the camouflage 426 metrics ensured that the selection rates for each could be directly compared. We then took the 427 linear regression coefficients for each metric as the linear selection rates. We used these to test for 428 differences in linear selection rates between different speed populations and over evolutionary time 429 (generations). We fitted linear mixed effect models using the linear regression coefficients for each 430 metric as the dependent variable, testing against the second order fixed effect of generation and the 431 fixed effect of population. Replicate ID was included as a random effect. An example of the model 432 structure used was as follows: 433 lmer(SD_ β ~ poly(Generation,2) + Population + (1|Replicate) 434 Significance tests for all models were carried out using the 'Anova' function from package 'car' 435 (version 3.0-2) [37] which was used to calculate Type II ANOVAs. Where relevant, post-hoc 436 comparisons were carried out with the 'emmeans' (version 1.3.4) package [38]. 437

Modelling methods 438
Motion modelling was carried out using a MATLAB implementation of a motion model using a two-439 dimensional array of correlation-type elementary motion detectors (as described in [39]) [40,41]. For 440 each "fast" bug in generation 0 (512 bugs in total) we generated a short movie where the bug 441 initially moved on an upwards trajectory and then rotated to move on a trajectory 15 degrees to the 442 right (see supplementary material for an example). We used the generation 0 bugs as these should 443 display a wide range of randomly selected pattern types, and the "fast" population as selection 444 seemed to be strongest on these targets, suggesting that we should see the largest differences in 445 fitness for this population. The time constant (tau) used was 3, the size of spacing between receptors 446 was 50, the size of the filter was 30 and the standard deviations of the Gaussians (used for 447 Difference of Gaussians spatial filtering) were 3 and 5. 448 For each bug, several metrics were calculated from the output of the motion model (after removing  449 zeros, corresponding to places in the image where no motion signal was observed). Firstly, the mean 450 resultant length of the circular direction data was calculated to give a measure of motion coherence. 451 Secondly, the average vector length was calculated as a measure of motion energy. Finally, the bias 452 was calculated by taking the difference between the circular mean and the "veridical" trajectory of 453 the target (assumed to be the average of the two directions the target moved in during the trial). All 454 circular statistics were calculated using CircStat [42]. 455 Modelling was carried out using linear models, with the log of fitness being used as the dependent 456 variable, and the coherence (mean resultant), bias (circular mean difference) and the motion energy 457 (average vector length) were used as fixed factors in the model. The interaction between coherence 458 and bias was also included, in line with predictions [19]. Finally, the data were filtered to include 459 only the points with a circular mean difference of less than 60 degrees. The results were not 460 qualitatively different if these data points were included. To test whether patterning metrics could 461 predict the motion energy model output variables, we fit linear models with either the bias or the 462 motion energy as independent variables, and either the standard deviation of the bug luminance or 463 a metric of "stripy-ness"(the energy for vertical filtering angles with a sigma value of 4). 464