Recognition of natural objects in the archerfish

Recognition of individual objects and their categorization is a complex computational task. Nevertheless, visual systems are able to perform this task in a rapid and accurate manner. Humans and other animals can efficiently recognize objects despite countless variations in their projection on the retina due to different viewing angles, distance, illumination conditions, and other parameters. Numerous studies conducted in mammals have associated the recognition process with cortical activity. Although the ability to recognize objects is not limited to mammals and has been well-documented in other vertebrates that lack a cortex, the mechanism remains elusive. To address this gap, we explored object recognition in the archerfish, which lack a fully developed cortex. Archerfish hunt by shooting a jet of water at aerial targets. We leveraged this unique skill to monitor visual behavior in archerfish by presenting fish with a set of images on a computer screen above the water tank and observing the behavioral response. This methodology served to characterize the ability of the archerfish to perform ecologically relevant recognition of natural objects. We found that archerfish can recognize an individual object presented under different conditions and that they can also categorize novel objects into known categories. Manipulating features of these objects revealed that the fish were more sensitive to object contours than texture and that a small number of features was sufficient for categorization. Our findings suggest the existence of a complex visual process in the archerfish visual system that enables object recognition and categorization.


32
For their survival, many animal species require the computational capacity to perform a range of 33 complex object recognition tasks, from identifying a conspecific to recognizing a camouflaged faces, are not ecologically relevant to birds, insects, or fish, nor do we expect fish to possess 55 specific brains areas dedicated to face processing, as is the case for humans. Yet, these findings 56 suggest the existence of a complex visual processing system in the brain that allows for the 57 extraction of the relevant features of an object, its recognition and categorization. 58 To address these questions concerning the nature of object recognition in non-mammalian 59 vertebrates, we examined the object recognition of natural objects in the archerfish (Toxotes 60 chatareus). The rationale for selecting the archerfish draws, in part, on the potential benefits of that settle on the foliage above the water line with a jet of water from the mouth (Lüling, 1963), 70 these fish can be trained to perform an object recognition task and essentially report their 71 decisions using stimuli in the lab. Thus, the archerfish can provide the fish equivalent of a 72 discriminative response by a monkey or by a human when performing a recognition task with a 73 click of a button.  Training. After a period of acclimatization, inexperienced fish were gradually trained to shoot at 83 targets presented on a computer screen (VW2245-T, 21.5", BenQ, Taiwan) situated 35±2 cm 84 above the water level. In the first stage, the fish were trained to shoot at a single black circle on a 85 white background that appeared at random locations on the screen. A blinking black square 86 appeared immediately prior to the display of the target in the middle of the screen and was used 87 as a cue to draw the fish's attention upward. If the fish shot at the target within 15 seconds from 88 the appearance of the target, it was rewarded with a food pellet. Otherwise, the target 89 disappeared and the next training trial started. The mean response time of the fish ranged from 2 90 4 to 10 seconds. The training continued until the fish succeeded in hitting 80% of the targets within 91 15 seconds.

92
After the fish learned to shoot at targets on the screen, they were trained to recognize either a 93 specific object or a category through a two-alternative forced choice procedure ( Fig. 2A). A 94 session of 20 trials where the fish had to choose and shoot at one of two images was repeated 95 over several (4-10) days to familiarize the fish with the experiment. When the fish achieved a 96 70% success rate in choosing the designated object or category, it was considered trained and   All images were preprocessed using Matlab. All background colors were removed and the 111 objects, after being converted to grayscale, were placed on a white background. The size of the 112 objects was randomized in the following way: the number of the pixels in the image was selected 113 to have a uniform distribution from a discrete set of object sizes. For this purpose, the images 114 were resized to create 5 levels of object area, defined as the number of pixels within the contour 115 of the object: ~10,000 pixels, ~50,000 pixels, 100,000 pixels, 200,000 pixels and 300,000 pixels.  There were two types of targets. First, an image of a specific spider was presented to the fish 121 together with a distracting object. The distracting objects were leaves, flowers, insects, or other 122 spiders. The target spider was shown from different viewpoints, with different orientation, size, 123 contrast, and screen locations. All presentation parameters were randomized.            The priors for  and  were chosen to be uniform and very broad. 190 We used JAGS (Plummer, 2003)

222
The fish were able to recognize and choose the target spider, both on trials where the second 223 object was not a spider and also against other spiders (Fig. 3B) A similar experiment was conducted with an ant as a target image. The same three fish were 231 retrained to recognize one specific ant that was shown together with other objects, and 232 sometimes with other ants (Fig. 3C). The fish learned to differentiate the target ant from the other 233 objects and also from other ants (Fig. 3D). The success rates in this experiment were not 234 significantly different from the rates in the experiment with a spider target.

236
The archerfish can categorize objects into classes and learn to generalize from examples 237 We tested the ability of the fish to discriminate between the images of two categories of stimuli were never repeated; that is, each image was used only once (around 1,500 images in total were 243 used in the experiments). After two to eight days, the success rate of the fish reached a plateau 244 that was significantly above chance level (Fig. 4B). The lower boundary of 95% HDI for the fish 245 with the lowest success rate was just above 60%. The higher boundary of 95% HDI for the fish 246 with the highest success rate was above 80%.

247
To test whether archerfish are predisposed to shooting at animals rather than plants, we tested 248 four additional fish, which were trained to shoot at the non-animal targets (i.e. non-edible).

249
Again, we found that the archerfish were able to select the non-animal targets at a significantly 250 higher level than chance (Fig. 4B). This is an indication that the archerfish is not hardwired to 251 select an animal.  We compared the performance of the support vector machine classifier trained on the raw 275 images to a classifier trained on a feature matrix and found that the use of features significantly 276 improved its performance. We also tried classifiers other than the support vector machine.

277
There was no significant difference in their performance, so we continued with the support 278 vector machine and features for the remainder of the analysis (Fig. 5C).

280
The classifier was built in an iterative manner, starting with the most informative feature; i.e., 281 the feature with the highest success rate when used in the model separately, then adding the next 282 most informative feature and so on, until the predictive value of the model became saturated. 283 We used a standard training set, verification set and test set to avoid over-fitting the model.

284
Although this was a greedy algorithm that could not guarantee an optimal solution, it still 285 provided a lower bound for the optimal performance.

287
To test the model (Fig. 5A), we used it to simulate the behavioral experiment. The recognition 288 rate at the output stage of the model matched the behavioral success rate of the fish (Fig. 5D), 289 indicating the capability of the model to capture the statistics of fish behavior. Next, we 290 analyzed the model structure to reveal aspects of the fish's decisions.

292
Shape is more important than texture in the archerfish object recognition 293 Fig. 5D shows that using only the first five features that describe an object's shape compactness 294 (ratio of convex hull to area and roundness), shape eccentricity, and texture (entropy and the 295 local standard deviation), the model's success rate saturated. Using these 5 features, the model 296 achieved a success rate of 94% compared to a success rate of 95% on all 18 features. 297 We calculated the model's success rate given only the two first shape features; specifically, the 298 ratio of the convex hull to the area together with eccentricity. The model's predictions were close 299 to saturation, with a success rate of 92% (Fig. 5E). When given only the two most important 300 texture features, entropy and the local standard deviation, the model's success rate was only 301 76%. This suggests that shape was more important than texture in the visual discrimination 302 performed by these fish.

303
To further test the prediction that shape features were more important than texture, we assessed 304 the ability of the fish to perform object recognition after removing all textures and leaving only 305 the silhouette of the image versus removing all the shape information and leaving only texture 306 (Fig. 6A). The experimental procedure was identical to that used in the original categorization 307 experiment. 308 We found that the fish were able to perform object discrimination between animals and foliage 309 when provided only with the shape but failed to do so when provided only with texture ( Fig.   310   6B). This fact, a finding in itself, also increases our confidence in interpreting results from the 311 model.

329
The lower bounds of 95% HDI of the success rate for all fish and all types of targets were well 330 above chance level (Fig. 7A), suggesting that at least part of the errors that the fish made were 331 due to execution noise and not due to the fish object recognition algorithm's inability to identify 332 an object.   made. When we trained the classifier based on the selection made by the fish, we found that it 368 achieved almost perfect performance in predicting the true labels of the objects. Furthermore, it 369 exhibited a hierarchy between features (Fig. 5D), suggesting that the fish attributed more 370 importance to shape feature than to texture features. The model also supported the hypothesis 371 that classification errors were mainly due to execution noise and were not image specific. We 372 tested these two hypotheses with additional experiments and confirmed them both.  image true labels and for predicting fish behavior (Fig. 5C). Therefore, the choice of the 417 classifier did not appear to significantly affect the results.

418
In addition, naïve application of the support vector machine on the raw images, by trying to    success rate for all sets of objects was above chance level for all three fish that finished all sets.