Inspecting Morphological Features of Mosquito Wings for Identification with Image Recognition Tools

Mosquitoes are important disease vectors. Different mosquito genera are associated with different diseases at varying levels of specificity. Hence, quick and low-cost methods of identification, even if relatively coarse and to genus level, will be of use in assessing risk and informing mitigation measures. Here we assess the extent to which digital photographs of mosquito wings taken with common cell phone cameras and clip-on lenses can be used to discriminate among mosquito genera when fed into image feature extraction algorithms. Our results show that genera may be distinguished on the basis of features extracted using the SURF algorithm. However, we also found that the naïve features examined here require very standardized photography and that different phone cameras have different signatures that may need to be taken into account.


3
Aedes prefer water high in organic compounds, such as found in polluted water, septic tanks and 66 gutters [17], although it has recently been discovered that several Anopheles species can tolerate 67 and breed in polluted water as well [18,19]. Different properties and parameters of water such as 68 pH level, hardness, temperature, chemical composition and the presence of bacterial fauna may all 69 influence which species of mosquito choose to breed in which locations. Thus, the condition and 70 quality of water in an area directly affect the presence of specific species of mosquito. 71 72

Mosquito-borne disease 73
Mosquitoes play an important role in the transmission of disease and can be vectors for pathogenic 74 bacteria, viruses and parasites. Mosquito borne diseases such as malaria, dengue, yellow fever and 75 West-Nile virus cause millions of deaths a year [5,6]. Not all mosquitoes are known disease vectors, 76 and many diseases are only carried by a specific genus or species. Table 2   Monitoring vectors is an important part of controlling and possibly preventing these diseases. 94 Because strains of pathogens are often carried by specific mosquito species or genera [6,30], an 95 essential aspect of monitoring disease vectors is to keep track of the distribution and abundance of 96

Image database creation 184
Photographs of mosquito wings were uploaded to the website "Flickr" (http://www.flickr.com) 185 using an account created to store orchid and butterfly photographs from previous research. All 186 mosquito wing photographs were uploaded to the folder "mosquito wings." Flickr was used to store 187 the photographs because the website allows users to add metadata to the pictures and because it is 188 accessible by an application-programming interface (api), allowing for easy retrieval of the 189 photographs along with their metadata. Metadata was included by the use of "tags." The tags 190 "genus:<name>," "species:<name>," "sex:<name>" and "project:mosquitoes" were given to each 191 photograph. A small set of 37 mosquito wings was photographed with two different cameras. These 192 photographs were given the tag "phone:S5" or "phone:G4," depending on which camera they were 193 photographed with. A part of the dataset was collected during fieldwork in South Africa. These 194 photographs were given the tag "location: SA." 195 196 Preparing and photographing 197 Because the ultimate goal of this project is to create an app that will be usable by anyone on a 198 smartphone or tablet, all photographs of mosquito wings have been made with a smartphone. My 199 personal smartphone, a Samsung Galaxy S5 Neo, was used as the main device for this project. A 200 subset of photographs was taken with a Motorola Moto G4 Plus, provided by Dr. Maarten Schrama,201 to see how the identification program reacts to different cameras. Mosquito wings are on average 202 only a few millimetres long and cannot be photographed in detail with the cameras used in current 203 smartphones. A clip-on macro lens attachment was used to increase detail and reduce focusing 204 distance. The attachment ("Clip on lens set for smartphones," Hema) was used for all photographs. 205 Macro lens attachments for smartphones are cheap and readily available both in physical shops and 206 online. It is therefore presumed that the need for this tool will not diminish the functionality of a 207 potential future app. 208

209
Mosquitoes were placed under a stereoscope and had their right wing removed using tweezers. 210 Wings were then placed on a white sheet of paper with a colour calibration chart and a 5-mm scale 211 bar printed on it ( Figure 2). The smartphone with lens attachment was suspended 2 cm above the 212 mosquito wing. Initially this was done by placing the phone on two Styrofoam blocks. Later two 213 sponges were used instead of the Styrofoam. The camera function of the phone was selected and 214 the camera was focused on the wing by tapping the phone screen in the appropriate spot. As 215 pushing the shutter button can introduce camera shake, especially when working on macro scale, a 216 timer was used to take the photograph. Photographs were transferred to a personal laptop (Acer 217 Aspire ES15) using a USB cable and then uploaded to the Flickr account. attached to the body. This set was used to test how accurate the image identification is if a whole 224 (or part of a) mosquito is visible in the image. As depth of field and lighting differed greatly between 225 individuals, depending on how the individual was positioned during preservation, the colour and 226 scale grid was omitted from these photographs. 227 12

Specimen collections 229
Four different sets of mosquito specimen collections were used in this project. All photographed 230 mosquitoes were either preserved in a freezer or dried. No mosquitoes preserved in liquid were 231 used in this project. The first set was provided by Dr. Maarten Schrama and came from the Institute 232 of Environmental Sciences. This set was mainly used as a practice and try-out set and was the only 233 set that wasn't photographed with the scale bar and colour calibration chart. Instead, a plain white 234 piece of printer paper provided the background. All mosquitoes were collected and preserved the 235 previous year. Culex pipiens individuals were caught at Hortus Botanicus, Leiden. All other 236 individuals were caught in South-Africa. Identification of South-African specimens was not 237 complete and the species of some individuals may not have been classified accurately. All genera 238 however were accurate. 239

240
The second set of mosquitoes was provided by Wageningen University. The mosquitoes were raised 241 in a laboratory setting. How long these mosquitoes had been preserved is unknown. 242

243
The third set came from Naturalis Biodiversity Center. This was a museum collection consisting of 244 pinned individuals that had been preserved for many years.

OrchID analyses 264
Initial test runs of the BGR and SURF-BOW algorithms were performed with small datasets on a 265 personal laptop, Acer Aspire ES15. All needed programs and modules to run the tests were installed 266 by Saskia de Vetter. Commands were run using Python 2.7. The personal laptop was not able to run 267 the Bag Of Words algorithm on extracted features from a large dataset due to memory limitations. 268 Final analyses with the complete dataset were therefore performed on a remote server with greater 269 operating speed and working memory. This server was accessed using the program PuTTY Suite

SURF analysis 304
The remote server was used to analyse 475 mosquito wing photographs with the SURF algorithm. 305 The resulting descriptions dictionary file was processed by the Bag-Of-Words algorithm which 306 produced a dataset of clustered features for analysis in R. PCA plots were made of this dataset that 307 visualized feature similarities on both genus (figure 5) and species (figure 6) level. The procedure 308 was kept identical to the PCA analyses of the BGR dataset: no scaling was used and principal 309 component 1 was used as x-axis and principal component 2 as y-axis.

BGR and SURF 340
PCA plots of neither BGR nor SURF analyses showed clear clustering of taxonomic groups, although 341 genera appeared somewhat more clustered than species. The two datasets were also combined so 342 that a PCA plot could be made of both algorithms together. This however did not improve results. 343 The BGR algorithm was programmed to divide photographs into 50 horizontal and 50 vertical bins 344 and the Bag-Of-Words algorithm was programmed to create 100 clusters out of the SURF dataset. 345 Tweaking these settings may provide more accurate clustering of groups. Removal of the included 346 scale bar and colour gradient could also make for a more accurate analysis for these particular 347 algorithms. The BGR and SURF algorithms were run with a Region of Interest (ROI) in an attempt to 348 test this hypothesis, but the region had to be defined by width and height in pixels and the chosen 349 region was not shown visually. This made it impossible to check if all of the scale bar and colour 350 gradient were removed while keeping all of the wing still in the frame, as there was some variation 351 in positioning per photograph. Because of this, ROI was left out of final analyses. In previous 352 research, BGR proved to be more useful than SURF in orchid identification, but SURF was more 353 effective in butterfly identification. SURF also appeared to be slightly better than BGR with the 354 mosquito wing dataset, at least on genus level. Mosquito wings seem to differ mostly in size and 355 shape and in areas of high contrast (black and white spots or scales), making an algorithm that 356 analyses pixel intensity more potentially more effective than one that analyses colours. 357 Anopheles species exhibit a spotted or banded pattern created by areas of black scales (Figure 9), 363 which are absent in Aedes and Culex. The latter two may however be dinstinguished by their shape: 364 the wings of most photographed Culex species appear to be more rounded (Figure 11), whereas 365 Aedes wings appear thinner and more straight (Figure 10). Differences between species are less 366 obvious, but some species may have a great difference in wing size compared to another species in 367 the same genus (Figures 9, 10, 11). Measuring wing size as a means for identification can however 368 be tricky, as wing size may vary greatly even within the same species ( Figure 12)  there is quite some discrepancy in found features. The two sets are clearly clustered in both the 395 BGR and the SURF results and there is no overlap between points. The different results from the 396 two smartphones can indicate that some hurdles may need to be overcome if this method of 397 analysis is to be used to create a universal app. Upon visual inspection (Figure 12), differences 398 between photographs of the same specimen can be clearly seen: the photograph taken with the 399 Motorola G4 camera appears more blurred than the one taken with the Samsung S5 camera. This 400 indicates that not all phones may be able to focus effectively at the minimal distances needed for 401 macro photography. Although taken on the same day, at the same time and with the same artificial 402 light, an inspection with the pipette tool in Adobe Photoshop CS2 of three randomly chosen points 403 of the background of the picture also shows a difference in overall hue, saturation, brightness and 404 colour balance (Table 3). Differing colour values can influence the BGR algorithm and differing 405 levels of saturation and brightness can cause the SURF algorithm to find different levels of pixel 406 intensity. Camera make and the type of processor in the device may contribute to slight visual 407 differences that can cause difficulties for image recognition software.

OrchID as an app 421
The aim of this study was to 1) test if mosquito wings are morphologically distinct enough to allow 422 image recognition software to recognize features unique to a taxon, and 2) take the first steps in 423 investigating if and how the OrchID program may be used as a universal identification application 424 that would be downloadable on mobile devices. Although BGR and SURF results were far from 425 perfect, some clustering did become apparent, especially on the genus level for the SURF algorithm. 426 This shows that taxonomic groups of mosquitoes do indeed possess unique qualities recognizable 427 by image recognition software. The study with landmark-based measurements by Wilke et al. 428 (2016) also shows clear differences between genera in wing shape and size. The two algorithms 429 used in this study did not measure these two parameters, but additional algorithms could be added 430 to the OrchID program to make the automated identification process more effective. The 431 implementation of the scale bar in this study could be part of a method of standardized wing 432 photography that would allow a machine learning tool to measure different dimensions within a 433 photograph. The added colour gradient may also be used to equalize colour levels for all 434 photographs to make the BGR algorithm more efficient, something that has not been done in the 435 current study. By refining and standardizing mosquito wing photography and by adding additional 436 algorithms that would allow the program to make automated measurements of structures in a 437 photograph, OrchID may become more precise at identifying taxonomic mosquito groups based on 438