A quantitative analysis of objective feather colour assessment: measurements in the lab are more reliable than in the field

The evolution of animal colouration is importantly driven by sexual selection operating on traits used to transmit information to rivals and potential mates, which therefore, have major impacts on fitness. Reflectance spectrometry has become a standard colour-measuring tool, especially after the discovery of tetrachromacy in birds and their ability to detect UV. Birds’ plumage patterns may be invisible to humans, necessitating a reliable and objective way of assessing colouration not dependent on human vision. Plumage colouration measurements can be taken directly on live birds in the field or in the lab (e.g. on collected feathers). Therefore, it is essential to determine which sampling method yields more repeatable and reliable measures, and which of the available quantitative approaches best assess the repeatability of these measures. Using a spectrophotometer, we measured melanin-based colouration in barn swallows’ (Hirundo rustica) plumage. We assessed the repeatability of measures obtained with both traditional sampling methods separately to quantitatively determine their reliability. We used the ANOVA-based method for calculating the repeatability of measurements from two years separately, and the GLMM-based method to calculate overall adjusted repeatabilities for both years. We repeated the assessment for the whole reflectance spectrum range and only the human-visible part, to assess the influence of the UV component on the reliabilities of sampling methodologies. Our results reveal very high repeatability for lab measurements and a lower, still moderate to high repeatability, for field measurements. Both increased when limited to only the human-visible part, for all plumage patches except the throat, where we observed the opposite trend. Repeatability between sampling methods was quite low including the whole spectrum, but moderate including only the human-visible part. Our results suggest higher reliability for measurements in the lab and higher power and accuracy of the GLMM-based method. They also suggest UV reflectance differences amongst different plumage patches.


Introduction
that measuring feather colouration in the lab will yield more repeatable and reliable measures, as it avoids 102 the logistic, technical and animal welfare limitations imposed by the field method and provides more 103 controlled conditions during the measurements. Also, we predict more realistic and accurate repeatability

116
Colour was quantified based on Endler and Mielke's [19] approach, using a USB2000 117 spectrophotometer (Ocean Optics, Dunedin, Florida), and a xenon flash lamp (Ocean Optics). Before 118 using the spectrophotometer, we calibrated it by setting the white and black references, i.e., we "told" the 119 machine which colour we want to be considered as the 100% reflectance (white) standard, and the 0% 120 reflectance (dark) standard, so that the rest of the colour measurements are determined in relation to 121 those maximum and minimum possible reflectance values. We used a WS-1 SS Diffuse Reflectance 122 Standard, a diffuse white plastic >98% reflective from 250-1500 nm, as the white reference (100% 123 reflectance), and a piece of black velvet as the dark standard (0% reflectance) to correct for the noise 124 when no light is reaching the sensor. At the far end of the reflection probe/light source, we put a non-125 reflective black pointer cut in a 45 degree angle, to avoid mismeasurement derived from the white light 6 belly and vent of each bird. For the measurements of feather samples in the lab, we collected enough 129 feathers from live birds as to being able to mount them one on top of the other and simulate the original 130 pattern found on live birds. We mounted the feathers on a piece of black velvet to avoid background noise.

131
Once we had tested for the reliability of the measures obtained with both sampling methods separately, 132 we averaged the three measurements for each method and used these average values to test the 133 comparability between field and lab measurements.

134
We also used Endler and Mielke's [19] method to calculate brightness, chroma and hue, parameters 135 generally used to specify a colour. Using their equations and the mathematical software Matlab (The

136
MathWorks Inc., Natick, MA), we got the spectral sensitivity functions of the cones corrected for the cut 137 points of oil droplets, calculated the quantal catch for each photoreceptor and converted those quantal 138 catches into dimensional colour space coordinates in a tetrahedral colour space (Fig. 1). Chroma is 139 defined as the strength of the colour signal or the degree of difference in stimulation among the cones, 140 and it is proportional to the Euclidean distance from the origin, that is, the distance from the bird grey 141 (achromatic) point to each point, specified by three space coordinates. Perception of hue depends of 142 which cones are stimulated, and in tetrahedral colour space, it is defined by the angle that a point makes 143 with the origin. As bird colour space is 3D, hue is defined by two angles, analogous to latitude and 144 longitude in geography [19].

145
Brightness is defined as the summed mean reflectance across the entire spectral range (R 300-700 ; 146 [44,45]). As well as these parameters, we included UV chroma, a measure of spectral purity, into our 147 analysis, which was calculated as the proportion of reflectance in the UV part of the spectrum (R 300-400 ) in 148 relation to the total reflectance spectrum (R 300-700 ; [46]). Cone sensitivities and oil droplet cut points were Although all the avian families investigated show plumages reflecting significant amounts of UV light 152 (see [51] for a review), in the particular case of barn swallows, ventral plumage shows a noisy reflectance 153 pattern in the UV part of the spectrum and does not exhibit a clear ultraviolet reflectance peak ( Fig. 2;

154
[34]). For this reason, we calculated the same colour variables both including and not including the UV 155 part of the reflectance spectrum, and carried out a repeatability assessment using either the whole 156 reflectance spectrum (300-700 nm) or only the human-visible part (400-700 nm). When using only the plumage area for each patch. In order to test for the reliability of both sampling procedures (described above) separately, we 167 calculated the repeatability for colour variables in the four patches for the different procedures according 168 to Lessells and Boag [43], Senar [52] and Quesada and Senar [39]. Repeatabilities were computed from

172
Once we calculated the "within-method" repeatabilities, i.e., the repeatabilities for each sampling 173 method, we averaged the three measurements per patch per individual and assessed the "between-

186
In order to have a general idea of repeatability for each patch, we included all the colour variables in a 187 principal component analysis (PCA) and calculated the repeatability (and confidence intervals) for the first independent of each other (i.e. different components of the spectra, same metrics on different years, or All the statistical analyses were carried out using R [54,55].

203
Our field study did not involve any endangered or protected species. The specific locations of the study 204 are provided as supporting material.

205
Birds were caught using mist nets under licence (AMcG BTO A licence Holder No.4947).

209
In 2009, when including the whole spectrum in the analyses, measuring plumage colouration in the lab 210 proved to be a reliable method. Brightness, UV chroma, chroma and hue latitude and longitude being 211 highly repeatable for all the patches, and hue latitude in the throat being less repeatable (r i =0.418,

212
F 21,44 =3.157, P=0.01; Table 1). The method of measuring the plumage colouration in the field (in three 213 different points, covering a wider area of each patch) was also quite consistent but with overall lower 214 values of repeatability, although still reasonably high, for all the variables and patches, being especially   Table 3 and Table 4).

303
A potential explanation for our findings is that the throat patch is smaller and much darker than the rest 304 of the patches. The feathers of the throat patch are also considerably smaller. Therefore, it is often quite 305 difficult to obtain a reliable reflectance measurement with such a limited amount of photons reaching the 306 spectrophotometer probe. Also, it is more difficult to create a "plumage patch" in the lab with a feather plumage patches directly on field birds were poorly repeatable compared to the values obtained for the 319 same variables measured from feather samples in the lab, and non-significant in all cases. In 2010, in 320 contrast, repeatabilities were higher and significant for certain metrics in certain patches only. These 321 results stand in marked contrast to the positive results of another study, which compared the 322 repeatabilities between both colouration assessment procedures for carotenoid-based plumage [39].

323
There can be several reasons for this difference: for example, due to the different characteristics of the 324 two types of pigments, carotenoid-derived colouration is more variable among individuals than melanin-325 based colouration [56], and repeatability of a character increases with variability [52]. In order to increase 326 the repeatability of some measurements, a possible solution could be to increase the number of 327 measurements, for example from three to five, as it has already been done by several authors [22,28,32].

328
Unfortunately, this is not an option when working with live birds in the field, as we would be increasing the 329 manipulation times and, consequently, the stress levels to an unacceptable degree, although it can be 330 applied when assessing colouration in the lab on feather samples [39].

344
Consequently, collecting feathers from birds and assessing their colouration in the lab, as well as being 345 more convenient, minimising risk to a sensitive device like a spectrophotometer and reducing handling adding the year variance into the total variance calculation, so that we could obtain the adjusted estimate the overall repeatability within each patch for both methods separately and across methods.

355
The possibility of calculating adjusted repeatabilities by including year as a random factor, together with principal components within each patch, which can be regarded as whole spectrum estimates, as they 367 gather a great proportion of the total variance contained in all the variables originally extracted from the 368 reflectance spectra measurements. And it also allowed us to do so with data from both years.

369
Finally, thanks to the use of the GLMM-based method, we could calculate confidence intervals, useful 370 indicators of the reliability of our repeatability estimates, additionally to just p-values. That way, and 371 together with the advantages mentioned above, it was possible to get a more complete and reliable overall 372 perspective of the question being studied.

373
The fact that almost all the repeatability measurements, and especially the repeatabilities across field 374 and lab methods (in patches other than the throat), were higher when including only the human-visible 375 spectrum in the analyses , suggests that the noisy reflectance pattern in the UV part of the spectrum may 376 be decreasing the repeatability and underestimating the comparability of the two methods. For throat 377 plumage, however, we observed the opposite trend, with higher repeatability values when including the 378 whole spectrum, which could be indicative that the UV part of the spectrum is more important in the throat 379 than in the rest of the patches. We find support for this idea when looking at reflectance spectra plots for Our results suggest that collecting feathers from live animals and assessing colouration in the lab is a 387 better approach for measuring plumage ornamentation in order to gain repeatable and reliable results 388 compared to direct measures on live birds in the field. In addition, since it is easier on equipment and 389 minimises the length of time birds need to be handled (minimising the stress levels inflicted on them),

390
feather sampling would appear to be the best method available.

391
In addition, from a statistical point of view, our results support the superiority of the GLMM-based method 392 [41]