Evidence for multifactorial processes underlying phenotypic variation in bat visual opsins

Studies of opsin genes offer insights into the evolutionary history and molecular basis of vertebrate color vision, but most assume intact open reading frames equate to functional phenotypes. Despite known variation in opsin repertoires and associated visual phenotypes, the genetic basis of such patterns has not been examined at each step of the central dogma. By comparing sequences, gene expression, and protein localization across a hyperdiverse group of mammals, noctilionoid bats, we find evidence that independent losses of S-opsin arose through disruptions at different stages of protein synthesis, while maintenance relates to frugivory. Discordance between DNA, RNA, and protein reveals that the loss of short-wave sensitivity in some lineages resulted from transcriptional and post-transcriptional changes in addition to degradation of open reading frames. These mismatches imply that visual phenotypes cannot reliably be predicted from genotypes alone, and connect ecology to multiple mechanisms behind the loss of color in vertebrates.


Introduction 40
Studies of opsin gene sequences, which encode the photoreceptor proteins within rod and cone 41 cells, have been crucial to understanding the molecular basis and evolutionary history of visual 42 perception in animals, and provide some of the most compelling examples of how substitutions in 43 the genetic code can bring about adaptive phenotypic change (Lucas et al., 2003;Porter et al., 44 2012;Yokoyama et al., 2008). In particular, comparisons of amino acids thought to underpin 45 spectral tuning help infer the visual abilities and sensory ecologies of extant taxa and, through 3 ancestral sequence reconstructions, extinct hypothetical forms (Hunt et al., 2009;Zhao, Rossiter, 47 et al., 2009). Other studies have revealed allelic variation in opsin genes within populations (Jacobs 48 et al., 2017), and the contribution of duplications (Cortesi et al., 2015;Feuda et al., 2016) and 49 adaptive gene losses (Davies et al., 2009) to photoreceptor repertoires. 50 51 Inferring phenotype solely from gene sequence, however, rests on the assumption that genetic 52 information is invariably transcribed from DNA to RNA, and translated from RNA to protein in 53 the presence of an intact open reading frame. However, while a growing number of studies have 54 examined the expression of opsin genes (Cheng & Flamarique, 2004) and/or their protein products 55 (Peichl et al., 2017;Wikler & Rakic, 1990), no comparative studies of opsins have probed each 56 step of this central dogma. Thus, the extent to which visual phenotypes are expressed or masked 57 due to the modulation of protein production is currently unknown. This represents a major gap in 58 our understanding of visual evolution, as mounting evidence from a range of systems is beginning 59 to reveal that complex post-transcriptional and post-translational routes shape phenotypic variation 60 and complicate genotype-to-phenotype mapping (Blount et al., 2012;Csardi et al., 2015;61 12 confirms the presence of a functional open reading frame in an additional species from the same 253 family as Chilonatalus (Emerling et al., 2015). As such mismatches between DNA and mRNA 254 have not previously been documented in bat opsins, more work is needed to both explore the extent 255 to which this occurs, as well as the molecular mechanism that underlies it. 256 257 Perhaps the most unusual result of this study was the apparent failure of an intact transcript to be 258 translated into protein in several taxa. Although the underlying cause of these disruptions in protein 259 translation can at this point only be inferred, close inspection of the transcripts involved suggests 260 that more than one molecular mechanism may be involved. In two taxa, Erophylla bombifrons and 261 Phyllonycteris poeyi, examination of the assembled transcript contained retained introns 262 (confirmed via BLAST). Transcriptional readthrough is well documented [e.g., (Gaidatzis et al., 263 2015;Vilborg & Steitz, 2017)] and such carryover of intronic sequence into OPN1SW would 264 interfere with protein formation. The presence of introns in OPN1SW mRNA has previously been 265 documented in the blind mole-rat Spalax ehrenbergi, which also lacks S-cones (David-Gray et al., 266 2002;Esquiva et al., 2016). In another case, multiple individuals of Pteronotus quadridens showed 267 variation in S-cone presence among conspecifics, indicating possible ongoing degradation of 268 protein synthesis. These mechanisms are not mutually exclusive: a similar result was found for P. 269 poeyi, for which S-cones were detected in four animals, alongside a seemingly non-functional 270 transcript detected in another that contained retained intronic sequences as well as a 4 base-pair 271 insertion in exon 3, again verified by manual PCR. 272 273 Taken together, our findings reveal assessments of visual perception based purely on genotypic 274 analyses of opsin sequences, or RNA transcripts, can be misleading, and may even obscure the 275 evolutionary processes and ecological agents of selection. Although variation in the complement 276 13 of photoreceptors across vertebrates is usually explained by disruptions to the protein-coding 277 sequence [e.g., (Mundy et al., 2016;Zhao, Rossiter, et al., 2009)], our findings of mismatches 278 between genotype and phenotype also indicate a role for transcriptional and even translational 279 control in this process. 280

281
Our results further allow us to hypothesize the developmental processes contributing to variation 282 in visual perception among bat taxa. Unlike species of Mus (Ortín-Martínez et al., 2014), in which 283 many cones express more than one photoreceptor, almost all cones in the bats we analyzed 284 expressed a single opsin type (OPN1SW or OPN1LW). Given this constraint, if S-opsin cones 285 were lost through developmental conversion into L-opsin cones, then, species with S-opsin cones 286 should have fewer L-opsin cones than species without S-opsin cones. Instead, species with S-opsin 287 cones actually have more L-opsin cones. Therefore, our findings are more consistent with the loss 288 of S-opsin cones in Yangochiroptera occurring during the process of cone determination. 289 Comparisons of photoreceptor development across ontogeny are needed to test this hypothesis. 290 291 Considering the distribution of losses throughout the phylogeny, and the many, diverse forms of 292 loss we identified, independent losses of S-cones must have occurred during the radiation of 293 noctilionoid bats. At a much broader phylogenetic scale, independent losses of S-opsin cones have 294 been documented more than 20 times throughout mammalian evolution (Emerling et al., 2015;295 Hunt & Peichl, 2014), and monochromatic -often nocturnal-species are usually closely related 296 to dichromatic relatives. Cases of loss-of-function in OPN1SW genes caused by the accumulation 297 of premature stop codons and other mutations have been documented across many vertebrate 298 lineages, and are generally inferred to require many generations (Kawamura & Kubotera, 2004). 299 14 In contrast, decreases in gene expression, which may represent the first step toward functional loss, 300 can evolve rapidly within populations (e.g., in cave fish) (Tobler et al., 2010). 301 302 Mismatches between mRNA transcripts and protein assays for OPN1SW in bats reveal the 303 evolution of photoreceptor composition and repartition to be flexible compared to 304 pseudogenization. OPN1SW genes with intact open reading frames in lineages that lack the 305 OPN1SW protein show few amino acid substitutions. This suggests mismatches between 306 transcription and expression evolve rapidly and may explain low rates of evolution whether or not 307 the S-cones are present (Table S2). Once the OPN1SW gene is pseudogenized, however, the 308 degradation of the gene seemingly becomes irreversible and a manifold increase in the protein-309 coding substitution rate ensues. Both frugivorous and non-frugivorous lineages experience strong 310 purifying selection to conserve ancestral function, highlighting post-transcriptional regulation as a 311 more direct response to ecology than pseudogenization of the relevant opsin (Table S4). These 312 results therefore suggest that protein composition should more closely reflect visual ecology than 313 high rates of sequence evolution and pseudogenization in the relevant opsin, as the latter only 314 responds to long-term functional loss. 315 316 Despite these observed widespread losses, the conservation of S-cones, OPN1SW transcription, 317 and protein-coding sequences in most species implies that S-cones have an important function in 318 most noctilionoid bats. To identify the likely ecological determinants of variation in S-cone 319 presence across the clade, we applied phylogenetic regressions, and found the predominance of 320 fruit consumption -and notably not of plant or flower-visiting, or insectivory-was the single 321 most powerful explanatory factor. This result strengthens the role of diet as the primary agent of 322 15 selection that maintains S-opsin function, and counters previous speculation that nectarivorous 323 New World leaf-nosed bat species rely on UV reflectance to locate flowers through short-wave 324 vision (Winter et al., 2003). Instead, we found some predominantly nectarivorous species such as 325 Erophylla spp. and Lonchophylla robusta have lost their S-cones, while others have retained them 326 (e.g., Anoura geoffroyi). This both suggests more than one strategy to finding flowers has evolved 327 among New World leaf-nosed bats, and contrasts with members of the primarily frugivorous 328 subfamily Stenodermatinae, which invariably retain their S-cones ( Figure 2). 329

330
In primates, the ability to discriminate between fruit and foliage has been identified as an 331 ecological advantage of daylight trichromacy (Fedigan et al., 2014), and bats that include fruit in 332 their diets could benefit from some degree of color discrimination. Although bats that possess both 333 L-and S-cones are either monochromatic or dichromatic in daylight, and conditionally dichromatic 334 or trichromatic in mesopic (i.e., crepuscular) conditions (Zele & Cao, 2015), the ability to detect 335 UV light might be important in fruit detection. Indeed, several stenodermatine bat species analyzed 336 here appear to be able to perceive UV light based on inferences from spectral tuning of OPN1SW 337 photopigments (Zhao, Rossiter, et al., 2009), behavioral tests of responsiveness to UV light 338 (wavelengths < 400 nm) (Gorresen et al., 2015;Winter et al., 2003), and electroretinographic 339 recordings of visual sensitivity (Müller et al., 2009). In plant species with animal-mediated seed 340 dispersal, dispersers are attracted to ripening fruits via a combination of visual and olfactory cues 341 that include the emission of attractive volatile compounds, a reduction of toxic compounds, and 342 changes in texture and color [e.g., (Goff & Klee, 2006;Weiblen et al., 2010;Wendeln et al., 2000)]. 343 Increases in UV reflectance also correlates with the decline of chlorophyll and ripening in several 344 fruits [for reviews see (Blanke & Lenz, 1989;Lagorio et al., 2015;Thies et al., 1998) contributing to their rapid taxonomic and ecological diversification . 350

351
Since the consumption of fruit arose as an evolutionary innovation within Yangochiroptera, 352 selection for this novel niche cannot explain the ancestral or present-day persistence of S-cones in 353 non-frugivorous species (Figure 2). While the retention of S-cones in bats that feed on animals 354 (e.g., Chrotopterus auritus and Noctilio leporinus) is surprising, recorded polymorphisms in the 355 presence of S-cones among some individuals of insectivorous P. quadridens could point to an 356 ongoing loss of S-cones in some of these lineages. Comparisons with L-opsin cones offer insights 357 into the general functional role of S-cones. While fruit consumption was not a covariate of L-opsin 358 cone density, the presence of S-cones is associated with an estimated 43% increase in the density 359 of L-opsin cones (Table 1), consistent with both types of cones serving a common functional role. 360 This role could be to increase visual sensitivity at dusk, similar to crepuscular and occasionally 361 diurnal Old-World fruit bats (Zhao, Rossiter, et al., 2009). As a result, light capture, instead of the 362 detection of novel visual cues, might have been the primary agent of selection on the density of S-363 and L-opsin cones in non-frugivorous lineages, as well as ancestral bats. The presence of S-cones 364 in ancestral noctilionoid bats might have then allowed them to take advantage of fruit as a novel 365 fruit source during their subsequent diversification, in a case of trait exaptation. 366 367

Conclusions 368
Vertebrate sensory adaptations have been the focus of intense research over the last fifteen years, 369 and bat vision has not been an exception [e.g., (Eklof & Jones, 2003;Feller et al., 2009)]. These 370 studies have upended formerly prevalent views on bat biology and behavior, finding increasingly 371 important roles for vision (and olfaction) in orientation and food localization, even among 372 echolocating species [e.g., (Altringham & Fenton, 2003;Korine & Kalko, 2005)]. But while 373 previous analyses of isolated lineages revealed the existence and extent of mesopic and photopic 374 cone-based vision in fewer than five noctilionoid species (Müller et al., 2009;Winter et al., 2003), 375 we examined dozens of ecologically diverse bat lineages throughout the Yangochiroptera suborder 376 and found a much more complex picture than was previously assumed. 377

378
We uncovered mismatches between OPN1SW transcripts and S-opsin cone phenotypes, as well as 379 widespread S-cone loss, with conservation linked to multiple factors including a novel dietary 380 niche. Our results also highlight the importance of rapid trait loss in evolution, with apparent shifts 381 in translation that precede pseudogenizing changes in open reading frames. As genotype-centered 382 analyses would miss important functional changes, our study illustrates the importance of a broad 383 comparative approach when studying sensory innovation, loss, and adaptation, as well as their 384 influence on taxonomic and ecological diversification. 385 386

Materials and methods 387 388
Species sampling and tissue preparation We obtained eye tissue from 59 New World bat species, 389 of which 49 were collected from the wild and 34 from museums, with 24 species common to both 390 sources (Table S1) We tested for the presence of the three focal gene transcripts (RHO, OPN1SW and OPN1LW) in 404 each bat transcriptome using a reciprocal best hit blast approach against the full set (n=22,285) of 405 human protein-coding genes from Ensembl 86 (Yates et al., 2016). To confirm the absence of 406 OPN1SW sequence, we performed additional steps in several species. First, we cut, trans-407 chimeras, which can prevent detection by reciprocal blast (Yang & Smith, 2013), and repeated the 408 reciprocal blast. Second, we manually screened sequences that were initially identified as matching 409 OPN1SW, but did not pass initial blast filtering (see Supplementary Information) was then performed on dissected retinas following standard procedures as in Ortin-Martinez et al. 417 (2014), with the appropriate mixture of primary and secondary antibodies (see Supplementary 418 Information for details). For each species, IHC was performed on replicates from at least three 419 individual retinas. We considered no labeling to indicate a true loss of the cone type following 420 references (Müller et al., 2009;Müller et al., 2007;Ortín-Martínez et al., 2014). Labelled retinas 421 were then imaged on a confocal microscope at 20X (LSM710; Zeiss Microscopy, see 422 Supplementary Information for details). The entire retina surface was captured using 512x512 423 pixel tiles and counted using a 3D object counter plugin using Fiji (ImageJ). The accuracy of this 424 approach was verified manually (Supplementary Information). Opsin density for each species was 425 calculated by averaging across three individuals. We also characterized the spatial distribution of 426 L-and S-cone densities for 14 species. 427 428 Opsin gene evolution We used aligned sequences from the transcriptomes of 38 species together 429 with those from six noctilionoid genomes (Zepeda Mendoza et al., 2018) to estimate rates of 430 molecular evolution of visual opsin genes (OPN1SW, OPN1LW, and RHO) in focal bats. First, we 431 tested for divergent selection modes among species that had S-opsin cones, lacked the S-opsin 432 cones but had an intact mRNA sequence, and those that lacked the S-opsin cones but either did 433 not have OPN1SW transcripts or had a pseudogenized OPN1SW sequence (Figure supplement 2) 434 using the Branch Model 2 of codeml in PAML 4.8a (Yang, 2007). Second, we applied the same 435 approach to test divergent selection modes between frugivorous and non-frugivorous bat species 436 Ecological correlates of cone presence and density To determine whether cone phenotypes are 439 explained by dietary specialization, we applied the hierarchical Bayesian approach implemented 440 in the R packages MCMCglmm and mulTree (Guillerme & Healy, 2014;Hadfield, 2010), using a 441 random sample of trees from the posterior distribution of published phylogenies (Rojas et al., 2016;442 Shi & Rabosky, 2015). We modeled S-cone presence as function of diet, and compared models 443 using the Deviance Information Criterion (DIC) (Gelman, 2004). Using the predictor variable from 444 the best-fit model for presence/absence of S-opsin cones, we then repeated this approach to explain 445 L-cone density (see Supplementary Information).  Grabherr, M. G., Haas, B. J., Yassour, M., Levin, J. Z., Thompson, D. A., Amit, I., Adiconis, X., 543 Fan, L., Raychowdhury, R., Zeng, Q., Chen, Z., Mauceli, E., Hacohen, N., Gnirke, A., 544 Rhind, N., di Palma, F., Birren, B. W., Nusbaum, C., Lindblad-Toh, K., et al. (2011). Full-545 length transcriptome assembly from RNA-Seq data without a reference genome. Nature 546 Biotechnology, 29 (7)  Adaptive molecular evolution in the opsin genes of rapidly speciating cichlid species. 644 Molecular Biology and Evolution, 22 (6) Yates, A., Akanni, W., Amode, M. R., Barrell, D., Billis, K., Carvalho-Silva, D., Cummins, C., 670 Clapham, P., Fitzgerald, S., Gil, L., Girón, C. G., Gordon, L., Hourlier, T., Hunt, S. E., 671 Janacek   invertebrates -moth, vertebrates -frog, fruit -fruit and nectar/pollen -flower. The species 714 phylogeny follows (Rojas et al., 2016;Shi & Rabosky, 2015). Vertical black bars, from left to 715 right, indicate: (1) Noctilionoidea, (2) Phyllostomidae, (3) and Stenodermatinae, respectively. 716 RNA-Seq data was generated to both infer the presence of an intact ORF (in combination with 717 genomic and PCR sequence data) and determine the presence of an expressed mRNA transcript. 718 The presence of an intact ORF and mRNA transcript for RHO was verified across all 719 transcriptomes. The presence of an intact ORF, mRNA and protein are indicated by a filled color 720 marker (OPN1SW -purple, OPN1LW -green and RHO -blue), and its absence by a white marker. 721 Missing data (i.e. species for which we were unable to obtain tissue) are indicated with a grey 722 marker. Mismatches between intact ORFs and transcripts are indicated by an inequality symbol. invertebrates -moth, vertebrates -frog, fruit -fruit and nectar/pollen -flower. The species 751 phylogeny follows (Rojas et al., 2016;Shi & Rabosky, 2015). Vertical black bars, from left to 752 right, indicate: (1) Noctilionoidea, (2) Phyllostomidae, (3) and Stenodermatinae, respectively. 753 RNA-Seq data was generated to both infer the presence of an intact ORF (in combination with 754 30 genomic and PCR sequence data) and determine the presence of an expressed mRNA transcript. 755 The presence/absence of a protein product for S-opsins was assayed by IHC on flat mounted 756 retinas. The presence of an intact ORF, mRNA and protein are indicated by a filled color marker, 757 and its absence by a white marker. Missing data (i.e. species for which we were unable to obtain 758 tissue) are indicated with a grey marker. Mismatches between intact ORFs and transcripts, or 759 between transcripts and protein data are indicated by an inequality symbol. Note: OPN1SW protein 760 assays for P. quadridens revealed polymorphisms within the sample, and we recorded positive 761 OPN1SW assays in some P. poeyi individuals despite an apparent disrupted ORF.

Species sampling
We obtained eye tissue samples from a total of 263 eyes from 232 individuals, representing 59 bat species from seven families. This includes three families from the focal Noctilionoidea superfamily and four closely related outgroup families. Specimens used for this study were either wild-caught animals or obtained from museum collections (see Table S1 for full species and permit information). Wild-caught specimens from 49 bat species were collected following the approved IACUC protocols and site-specific permits. Bats were sampled from wild populations, and caught using mist nests set in forests and/or at cave entrances. Animals were handled following IACUC and site-specific protocols to minimize stress and were euthanized using an excess of isoflurane. Fresh bat specimens used for RNA-Seq analyses (nRNA-Seq = 45) were sampled in the Dominican Republic and Peru, and those for immunohistochemistry were sampled in the Dominican Republic, Puerto Rico, Belize, and Trinidad. Additional eye samples from 34 species were dissected from specimens from the American Museum of Natural History (AMNH). For immunohistochemistry, five bat species had replicates that were both wild-caught and from museum collections and exhibited the same phenotype, highlighting the robustness of the experiments. Due to preservation methods, the AMNH samples were only suitable for immunohistochemistry (IHC) and so were not included in the transcriptomic study (see Table S1 for AMNH specimen identification numbers).

RNA sequence analysis
Shortly after death, intact eyes were excised and placed in RNAlater and incubated at 4°C overnight before being stored at -180°C in vapor-phase liquid nitrogen, or at -80°C in a freezer. Frozen eye tissues were placed in Buffer RLT with added Dithiothreitol (DTT) then homogenized using a Qiagen TissueLyser. Total RNA was then extracted using Qiagen RNeasy® Mini kits following the manufacturer's protocol. Following extraction, RNA integrity was assessed using an Agilent 2100 Bioanalyzer and RNA concentration was measured using a Qubit Fluorometer. Library preparation was performed using Illumina TruSeq® RNA Sample Preparation v2, with 500 ng of total RNA used for each sample. Constructed libraries were pooled and sequenced on the NextSeq500 High Output Run (150 cycles) to give 2x75 base-pair (bp) paired-end (PE) reads at the Genome Centre, Queen Mary University of London. Using the above approach, we sequenced eye transcriptomes from 45 individuals, representing 38 bat species; including biological replicates for three species (Pteronotus parnellii: n = 4; Artibeus jamaicensis: n = 4 and Phyllops falcatus: n = 2).

Opsin gene annotation and examination of CDS
We used a combination of transcriptomic and genomic data to establish if the coding sequences (CDSs) of the three focal genes (OPN1SW, OPN1LW and RHO) were intact, and whether or not the mRNA of these genes was expressed in the eyes of the bats under study. We used a reciprocal best hit blast approach to establish whether or not transcripts corresponding to the three visual pigments (OPN1SW, OPN1LW and RHO) were present in each of the assembled transcriptomes. Sequences representing the protein products encoded by 22,285 human protein coding genes were downloaded from Ensembl 86 (Yates et al., 2016), for each gene product only the longest protein sequence was retained. These sequences were then used as tblastn (blast+ v.2.2.29) queries against each of the 45 bat transcriptome databases, the top hit was kept with an e-value cut-off <1e -6 . Reciprocal blasts were then carried out using blastx (blast+ v.2.2.29), with bat transcripts as queries against the human protein database, again only keeping the top hit and e-value <1e -6 . Percentage coverage of each bat hit against the human protein was calculated with the perl script analyze_blastPlus_topHit_coverage.pl available in Trinity utils. Candidate coding sequences (CDSs) were then extracted from the transcriptome assembly based on the blast coordinates using a custom perl script.
For transcriptome assemblies in which the OPN1SW sequence was not initially recovered, we undertook a number of additional steps to confirm that the sequence was not present. Firstly, as de novo transcriptome assemblies can create erroneous chimeric transcripts that may affect reciprocal blast results we followed the approach of (Yang & Smith, 2013) to reduce the number of transself-chimeras. Briefly, this involves performing a blastx search of the bat queries against the human protein database with an e-value cutoff of 0.01. Hits that met the default parameters of identity ≥30% and length ≥100 base-pairs are used to then search for either self-chimeras or multigene chimeras. Detected putative chimeras are then cut into segments, and retained if >100 basepairs. This approach is only able to screen for trans-chimeras. Following chimera detection, the initial reciprocal blast described above was repeated. Lastly, we manually re-blasted all sequences that were initially identified as matching OPN1SW, but did then not pass the stringent reciprocal blast procedure.
Finally, we mapped the reads of Carollia brevicauda against Carollia perspicillata to validate the presence of a stop codon. Mapping was performed using the script align_and_estimate_abundance.pl available from Trinity utils, and RSEM v. 1.2.31 and BOWTIE v.1.1.2.

Verification of OPN1SW sequence with PCR
For three species we amplified the genomic region spanning exons 3−4 of OPN1SW, from DNA extracted from the same individuals used to generate the RNA-Seq data. DNA extractions were carried out using Qiagen DNeasy Blood & Tissue Kit (69504). We used previously published primers (Zhao et al., 2009). PCR mixtures consisted of 12.5 µl EconoTAQ Master Mix 2x, 3 µl of each primer (10 µM), 4 µl of genomic DNA (> 100 ng). On an Eppendorf Mastercycler ProS, PCR was carried out with a single cycle at 94°C for 2 min followed by 35 cycles of 94°C for 30 sec, annealing temperature of 55°C for 40 sec, 72°C for 1 min 30 sec, and finally a single cycle of 72°C for 10 min. PCR products were visualized on a 1% TBE-agarose gel. PCR product was cleaned up using Agencourt AMPure XP and submitted for cycle sequencing. Sequences have been submitted to GenBank XXXXXX.

Immunohistochemistry -opsin relative distribution and density
For the IHC assay, eyes were fixed overnight at 4°C in phospho-buffered saline (PBS) 4% paraformaldehyde (PFA), transferred into 1% PBS, and stored in 1% PFA at 4°C in 1% PBS until further processing. We supplemented the IHC sample by obtaining eyes from 34 additional bat species from the alcohol collections of the AMNH. As these specimens had been stored in 70% ethanol, once excised tissues were rehydrated in 1% PBS prior to dissection and then stored as above. Prior to processing, retinas were dissected from the eyeballs and were flattened by making three or four radial incisions from the outside of the retina inwards, with the deepest cut in the nasal pole. Immunodetection was carried out following standard procedures described in Ortin-Martinez et al. (2014). Retinas were permeated with two washes in PBS 0.5% Triton X-100 (Tx) and frozen for 15 minutes at −70°C in 100% methanol. Retinas were then thawed at room temperature, rinsed twice in PBS-0.5%Tx and incubated overnight at 4°C in the appropriate mixture of primary antibodies diluted in blocking buffer (PBS, 2% normal donkey serum, 2%Tx). The next day, retinas were washed four times in PBS-0.5Tx and incubated for 2 hours at room temperature in secondary antibodies diluted in PBS-2%Tx. Finally, retinas were thoroughly washed four times in PBS-0.5%Tx and, after a last rinse in PBS, mounted scleral side up on slides in anti-fading solution (Prolong Gold Antifade, Thermofisher). For each species, IHC was performed on at least three retinas from at least three field-collected individuals, and three retinas from two individuals for museum-sampled species (for details see Table S1). Given the consistency of the detection of these antibodies across all bats and other mammals (Müller et al., 2009;Müller et al., 2007;Ortín-Martínez et al., 2014), and the number of replicates and individuals examined, we interpreted no labeling to indicate a true loss of the respective cone type.

IHC -Photoreceptor quantification
Flat-mounted retinas were photographed using a 20x objective on a confocal microscope (LSM710; Zeiss Microscopy). 564 and 633 lasers were used to excited Alexa 568 and Alexa 647 dyes, each labelling respectively S-and L-opsins. Each entire retina was completely imaged using 512x512 pixel tiles. For each retina, each tile was then Zstacked and automatically counted using a 3D object counter plugin using Fiji (ImageJ). The accuracy of this automatic approach was verified by manually counting three biological replicates of five bat species, by two different people. For each retina quantified, the density was calculated for each tile and then averaged for each individual (total count was average over three individuals) and for each species (by averaging the average of the three individuals). The spatial distribution of L-and S-cone density was visualized for the following 14 species: Mormoops blainvillei, Pteronotus quadridens, Macrotus waterhousii, Desmodus rotundus, Gardnerycteris crenulatum, Monophyllus redmani, Glossophaga soricina, Erophylla sezekorni, Brachyphylla nana pumila, Carollia sowelli, Sturnira lilium, Artibeus jamaicensis, Artibeus phaeotis, and Phyllops falcatus (see Figure 3, Table S4).

Opsin gene evolution
We obtained protein-coding sequences for OPN1LW and RHO from the transcriptomes of all 38 species and included sequence data for Eptesicus fuscus (from GenBank), for a total of 39 species. For OPN1SW, we obtained sequence data for 28 species from the transcriptome data sets. We then supplemented the OPN1SW nucleotide sequences with those extracted from genome data using a combination of blastn and bl2seq on five noctilionoid bat genomes (Artibeus jamaicensis, Desmodus rotundus (Zepeda Mendoza et al., 2018), Lionycteris spurrelli, Macrotus waterhousii, Mormoops blainvillei and Noctilio leporinus) using the Miniopterus natalensis OPN1SW (XM_016213323.1) sequence as a query. The L. spurrelli and M. waterhousii genomes were sequenced by the Rossiter Lab, and the A. jamaicensis, M. blainvillei and N. leporinus genomes were made available by the Broad Institute. We also obtained the OPN1SW sequences from E. fuscus from GenBank. The extracted and aligned sequences are available from DRYAD XXXX.
Sequences for OPN1LW and RHO were aligned using MUSCLE v3.8.425 (Edgar, 2004) as translated amino acids to keep the sequences in frame. The software implementation of our model requires the alignment to be in frame without any stop codons, therefore, stop codons at the end of the reading frame were removed from the alignment. Due to stop codons and indels, nucleotide sequences for OPN1SW were aligned by eye, with columns containing disruptions to the reading frame being removed to keep the remaining sequences in frame. Hypervariable regions at the beginning or end of sequences that may be caused assembly errors were masked by 'Ns' in two cases (Phyllops falcatus and Phyllonycteris poeyi). Additionally, premature stop codons were masked with 'Ns' and columns containing insertions that shifted the translation frame were deleted to keep codons in frame.
We tested for whether there were differences in rates of molecular evolution in the three opsin genes by estimating the ratio of the rates of nonsynonymous to synonymous substitutions (ω) for different branch classes. We set up two frameworks: S-cone presence and diet. When looking at S-cone presence variation, in our 2-branch class test, we estimated differences for branches that lacked the S-cone protein (ωS-cone.absent), and those that had the S-cone protein present (ωbackground). We also designed a three branch-class test in which we estimated different rates for bats with Scones (ωbackground), bats that lack S-cones but have an intact reading frame for the OPN1SW transcripts (ωOPN1SW.intact), and bats that lack S-cones and OPN1SW is a pseudogene (ωOPN1SW.pseudo). If bats that lack the S-cone experience relaxed selection, we expect higher rates of ω in bats without the S-cone in the OPN1SW gene, but no differences among groups in the other two opsins. The sequences from lineages for which no S-cone data was available were removed from this analysis (n = 8). Finally, we tested if there were differences in rates in frugivorous lineages (ωfrugivore) and all other bats (ωbackground). Figure S2 depicts branches labeled with respective branch classes. Frugivory data was available for all lineages, and thus all available sequences were used in this analysis.
These analyses were performed using the branch model implemented in the codeml routine of PAML 4.8a (Yang, 2007). Differences among branches were compared against estimates for a single ω for all branches (ωbackground). We used a likelihood ratio test to compare the best-fit model for each opsin gene. The analysis used the species topology that merged a recently published phylogeny of all bats (Shi & Rabosky, 2015) with that of a recently published noctilionoid tree (Rojas et al., 2016). The tree was trimmed using the geiger v. 2.0.6 package in R (Harmon et al., 2008).

Ecological correlates of cone presence and density
A hierarchical Bayesian approach was used to quantify the influence of ecology on the presence of S-cones while accounting for the phylogenetic correlation between observations from different species. A hierarchical approach is often called a mixed model in the literature, with cluster-specific effects called "random", and sample-wide effects called "fixed". As different fields apply random and fixed to different levels of the hierarchy, here we adopt the language of cluster-specific and sample-wide effects (Gelman, 2005). The effect of species was quantified by including species as a cluster-specific, or random effect in the R package MCMCglmm (Hadfield, 2010). Additionally, to address variation among different estimates of phylogeny, we used the R package mulTree (Guillerme & Healy, 2014) to run the Bayesian models across a sample of trees obtained from the posterior of phylogenies.
The first set of models explained presence or absence of S-cones as a series of ecological dummy variables coded as prevalence or non-prevalence of plant materials, fruit, or other (insects or small vertebrates) items in diet. In the sample-wide or fixed portion of these models, observations y for each species from 1 to i for each diet category 1 to j correspond to a single-trial binomial response of the probability of observing S-cones given by pri such that:

yi~ dbern(pri) logit(pri)= a+b.diet[dietj]
Models for each of the dietary categorizations were then compared using the Deviance Information Criterion or DIC. For Bayesian models, a lower DIC can indicate a better fit of the model to the data (Gelman, 2004). The predictor variable identified by the best-fit model for presence/absence of S-cones was then used in subsequent analyses of cone density.
Analyses of the sample-wide or fixed portion of cone density modeled the natural logarithm of the cone density y for each observation i as a function of dummy predictor variables defined by an diet group j, or S-cone group k. ln(yi) was modeled as a random, normally distributed variable with mean mu and variance, as below:

ln(yi)~ dnorm(mu, variance) ln(yi)= a+b.diet[dietj], or ln(yi)= a+b.S-cone[S-conek]
Unlike the presence/absence analyses, these response variables were normally distributed, with the sample-wide portion of the model accounting for the effect of diet or S-cone presence on L-cone density. The cluster-specific or random effect accounted for both the relationships between species and the clustering of observations when more than one measurement was taken for each species.
To estimate the covariation arising from phylogeny for all species analyzed, we used the phylogeny of Shi and Rabosky (2015) for non-noctilionoids and as a base tree. For New World noctilionoids, a posterior sample of 100 trees from the New World noctilionoid phylogenies of Rojas et al. (2016) was grafted onto the base tree and used as input in mulTree analyses. Each regression 6 corresponding to one of the 100 phylogenies ran for 20M generations, sampling every 1000 generations, with a burn-in of 100,000. Each regression ran 2 separate chains, assessing postburnin convergence by comparing posteriors and reaching estimated sampling sizes (ESS) of at least 200 for all model parameters.
MCMCglmm uses inverse Wishart distributions for priors on sample-wide variance and clusterspecific or phylogenetic variance. For uncorrelated predictor variables, these functions collapse into inverse gamma distributions for residuals. The choice of the residual prior is particularly important for phylogenetic logistic regressions (Ives & Garland, 2014), for which this variance is not identified in the likelihood (Hadfield, 2010). Hence, to properly estimate the posterior, we fixed the residual variance at 1. As the phylogenetic structure of the data corresponds to a matrix structure of correlations between observations, it becomes necessary to expand the parameters by specifying both the mean and covariance matrix on the prior, in addition to the shape and scale parameters given by nu/2. For the logistic regressions, the prior on the phylogenetic structure was given by nu = 1000 and V = 1 (generating a very long-tailed distribution), with prior mean of 0 and covariance of 1. For Gaussian regressions, the prior on the residuals was given by nu = 1 and V = 1 (Hadfield, 2016).
The results from all 100 posterior trees were combined to generate parameter estimates and 95% high-probability density from the joint posterior probability distribution. Predictors were found to influence the response if the high-probability density for their coefficient did not overlap with 0. We also calculated the explained variance or R 2 based on the method proposed by Nakagawa and Schielzeth (2013). Two types of R 2 were calculated for hierarchical Bayesian models, based only on the sample-wide factors or marginal R 2 , and based on both sample-wide and cluster specific factors, or conditional R 2 .

RNA quality and sequencing statistics
Integrity of extracted RNA varied across samples (RIN 4.1 to 10), the majority of samples obtained RINs greater than the recommended value of 8. Following sequencing, the number of raw reads ranged from 9,129,366 to 47,639,831 per sample. Cleaned reads ranged from 8,659,526 to 45,280,391 per sample, which resulted in assemblies of 67,459 to 131,925 across species.
For three species (Glossophaga soricina, Vampyrodes caraccioli and Artibeus bogotensis), the OPN1SW transcript was recovered following chimera removal as the transcript was initially joined to that of CALU. The OPN1SW transcript recovered for Carollia brevicauda contained an internal stop codon, however, this was not shared by Carollia perspicillata. Mapping of the C. brevicauda short reads against C. perspicillata using the script align_and_estimate_abundance.pl available from Trinity utils, and RSEM and bowtie, suggested low-coverage and mapped reads did not show this mutation, thus this mutation might not be genuine. Manual blasting of the Phyllonycteris poeyi and Erophylla bombifrons assemblies for OPN1SW recovered several fragmentary sequences that could represent this gene. However, sections of the sequences were highly divergent and were found to represent intronic regions by BLAST implying there could be retained introns in the mRNA.
In most species examined either both the OPN1SW transcript and protein were detected (n = 32), or neither were detected (n = 5). For example, all members of the Stenodermatinae clade of fruiteating bats examined were found to have both the OPN1SW transcript and protein. Outside of this clade, we also detected the OPN1SW protein, and in most cases confirmed transcript presence with RNA-Seq, in species distributed widely throughout the phylogeny. Including within Emballonuridae (Saccopteryx bilineata and S. leptura), Noctilionidae (Noctilio leporinus), Mormoopidae (Pteronotus quadridens), and other Phyllostomidae (Phyllostomus elongatus, Anoura geoffroyi, Glossophaga soricina, Carollia perspicillata, Rhinophylla fischerae, and R. pumilio).
Opsin gene evolution The alignments used for input for the PAML analyses resulted in 349 codons for OPN1SW, 364 for OPN1LW, and 348 for RHO. For the analysis testing the difference in ω rates between species that do and do not express the S-cone protein, there was no difference in ω estimates between branch classes for the RHO (χ 2 (1) = 0.12, P = 0.73; χ 2 (2) = 0.41, P = 0.81). However, there was a difference favoring the three-branch class model for OPN1SW (ωbackground = 0.11; ωOPN1SW.intact = 0.07; ωOPN1SW.pseudo= 0.79; χ 2 (2) = 129.9, P < 1.0e-5) and OPN1LW (ωbackground = 0.08; ωOPN1SW.intact = 0.08; ωOPN1SW.pseudo= 0.17; χ 2 (2) = 6.9, P = 0.03) gene (Table S2). For the analysis testing for difference in ω rates between frugivorous species and background branches, there was no difference in ω estimates between branch classes for the OPN1SW (χ 2 (1) = 0.4, P = 0.53) and OPN1LW (χ 2 (1) = 0.49, P = 0.48) genes (Table S3). However, there was a difference in ω for the RHO gene (Table S3), in frugivorous lineages showed significantly lower rates than background branches (ωbackground = 0.04; ωfugivory= 0.02; χ 2 (1) = 11.0, P = 9.1e-4). Figure S4 summarize the results from Bayesian regressions. The frequency distributions of the DIC for three sets of phylogenetic regressions of S-cone presence against particular diet prevalence reveal the prevalence of fruit results in the least DIC of any model, providing the best fit to these data (Fig.   8   S4). The posterior estimates of parameters for this model reveal the prevalence of fruit in diet increases the odds of having the S-cone by a factor of 37.7 (given by e^coefficient of fruit prevalence, or from odd of ~0.34 to ~12.8, Table S5). Analyses of the density of long-wave cones revealed no statistically meaningful effect of fruit prevalence on this density (Table S6). Instead, analyses of the density of long-wave cones as a function of the presence of S-cones revealed having the S-cones increases the density of L-cones by 0.43 in the natural logarithm scale (or a factor of ~1.54 in the linear scale) compared to species without S-cones (or from a baseline of ~3944 to ~6063, Table S7). Table S1 -as separate Excel file Specimen and sampling information for the tissues used by this study.

Table S2
Results of molecular evolution branch analyses for each of the three opsin genes tested for differences in rates of nonsynonymous to synonymous substitutions (ω) for lineages that lack the S-cone and lineages that have retained the S-cone. Grey boxes indicate the preferred model inferred from a likelihood ratio test. lnL: log-likelihood; np: number of parameters; TL: tree length; k: kappa (transition/transversion ratios); LR: likelihood ratio; p: p-value of likelihood ratio of alternative relative to null for each test

Table S3
Results of molecular evolution branch analyses for each opsin gene that test for differences in rates of nonsynonymous to synonymous substitutions (ω) for frugivorous and non-frugivorous lineages. Grey boxes indicate the preferred model inferred from a likelihood ratio test. lnL: log-likelihood; np: number of parameters; TL: tree length; k: kappa (transition/transversion ratios); LR: likelihood ratio; p: p-value of likelihood ratio of alternative relative to null for each test

Gene
Model lnL np TL k LR p ω    Rojas et al. (2018), dietary types are indicated with the following symbols: invertebrates -moth, vertebrates -frog, fruit -fruit and nectar/pollen -flower. The species phylogeny follows (Rojas et al., 2016;Shi & Rabosky, 2015). Vertical black bars, from left to right, indicate: (1) Noctilionoidea, (2) Phyllostomidae, (3) and Stenodermatinae, respectively. RNA-Seq data was generated to both infer the presence of an intact ORF (in combination with genomic and PCR sequence data) and determine the presence of an expressed mRNA transcript. The presence of an intact ORF and mRNA transcript for RHO was verified across all transcriptomes. The presence/absence of a protein product for S-and L-opsins was assayed by IHC on flat mounted retinas. The presence of an intact ORF, mRNA and protein are indicated by a filled color marker, and its absence by an empty marker. Missing data (i.e. species for which we were unable to obtain tissue) are indicated with a grey marker with grey outline. Finally, a grey marker with no outline indicates the failure of protein assay for some species represented by museum specimens. Mismatches between intact ORFs and transcripts, or between transcripts and protein data are indicated by an inequality symbol. Note: OPN1SW protein assays for Pteronotus quadridens revealed polymorphisms within the sample, and we recorded positive OPN1SW assays in some Phyllonycteris poeyi individuals despite an apparent disrupted ORF.