Multiple lineages, same molecular basis: task specialization is commonly regulated across all eusocial bee groups

A striking feature of advanced insect societies is the existence of workers that forgo reproduction. Two broad types of workers exist in eusocial bees: nurses which care for their young siblings and the queen, and foragers who guard the nest and forage for food. Comparisons between this two worker subcastes have been performed in honeybees, but data from other bees are scarce. To understand whether similar molecular mechanisms are involved in nurse-forager differences across distinct species, we compared gene expression and DNA methylation profiles between nurses and foragers of the buff-tailed bumblebee Bombus terrestris and of the stingless bee Tetragonisca angustula. These datasets were then discussed comparatively to previous findings on honeybees. Our analyses revealed that although the expression pattern of genes is often species-specific, many of the biological processes and molecular pathways involved are common. Moreover, DNA methylation and gene expression correlation were dependent on the nucleotide context.


Introduction 28
Caste specialization in eusocial insects is a notorious example of polyphenism, 29 where multiple morphological and behavioural phenotypes emerge from the same 30 genotype 1,2 . In social Hymenoptera (bees, wasps and ants), queen and worker 31 reproductive castes perform distinct functions in the colony. While queens undertake 32 reproductive duties, workers perform all the other tasks necessary for nest 33 maintenance and growth 3 . Two broad categories of workers exist in eusocial bees: 34 nurses and foragers 4,5 . Nurses are responsible for comb construction, offspring/queen 35 care and internal colony maintenance, while foragers perform tasks related to external 36 colony defence and resources provisioning 5,6 . In advanced eusocial bee species, such 37 as honeybees, worker subcastes are mainly age determined; younger bees are nurses 38 and when they become older, they switch to being foragers 7,8 . In primitively eusocial 39 species, such as the social bumblebees, specialization in worker subcastes is not so 40 straightforward 9,10 . 41 To investigate differences in bee worker subcastes, many studies have been 42 conducted in the highly eusocial honeybee (Apis). Gene expression comparisons have 43 identified expression changes between worker behaviours 1,5,7,11,12 , which could even 44 be used to predict neurogenomic states in individual bees 13 . Similarly, profiles of 45 DNA methylation, an epigenetic marks that likely underpins gene expression 46 differences, were additionally shown to directly correlate with worker task 14,15 . 47 Certain genes were differentially methylated according to the worker subcaste and 48 foragers that were forced to revert to nursing restored more than half of the nursing-49 specific DNA methylation marks 16,17 . 50 Many of the molecular differences between honeybee workers and nurses 51 could have arisen later in the evolution of this lineage. To broadly understand how 52 For both species we used as reference a transcriptome set of 79 superTranscripts 25 , in which multiple transcripts from the same gene are represented 80 in a single sequence. B. terrestris workers had 27,987 superTranscripts of which 431 81 are potentially lncRNAs and 21,638 (77,3%) were annotated. The final T. angustula 82 assembly had 33,065 superTranscripts, and was largely complete. Indeed, 26, 623 83 superTranscripts (80.5%) had a high sequence similarity to known protein-coding 84 genes from other species in the UniRef90 database, and 347 were considered 85 lncRNAs (transcriptomes available at 86 https://github.com/nat2bee/Foragers_vs_Nurses). A summary of major quality 87 parameters from the two species datasets can be found on Table SI.  88 89

Differential expression analyses in Bombus terrestris 90
Since task division in B. terrestris workers is a plastic behaviour 9,21 , we 91 performed a principal component analysis of the normalized read counts as an 92 additional verification step to validate our sampling method. The main components 93 clearly clustered nurses and foragers samples separately ( Figure S1) indicating that 94 our sampling method was efficient to obtain two distinct groups in bumblebee 95 workers, here considered as nurses and foragers due to the activities they were 96 performing when sampled. We found 1,203 differentially expressed superTranscripts 97 between the two worker groups ( Figure S2), whereby 436 superTranscripts were more 98 highly expressed in nurses (Supplementary file S2) and 767 were more highly 99 expressed in foragers (Supplementary file S3). The majority of these superTranscripts 100 (77.3% and 72.6% respectively) have similarity to known protein-coding genes, while 101 respectively three and one are possible long non-coding RNAs (lncRNAs). Five Gene 102 transposition"; "DNA integration"; "DNA recombination"; and "pseudouridine 104 synthesis") were overrepresented among the differentially expressed superTranscripts 105 (p < 0.01; Table SII). 106 107 Differential expression analyses in Tetragonisca angustula 108 In workers of T. angustula 241 superTranscripts were differentially expressed 109 between nurses and foragers ( Figure S2). Among these, 179 had higher levels of 110 expression in nurses, being 157 genes with a significant blast hit to protein databases 111 (Supplementary file S4). Foragers had 62 superTranscripts reported as more highly 112 expressed than in nurses of which 59 were annotated (Supplementary file S5). 113 Enrichment analyses revealed 30 GO terms for biological process (BP) as enriched in 114 the tested set of differentially expressed superTranscripts when compared to the entire 115 transcriptome (p < 0.01; Table SII), including processes related to mitochondrial 116 metabolism ("aerobic respiration"; "respiratory electron transport chain"; "oxidative 117 phosphorylation" and "mitochondrial ATP synthesis coupled electron transport") and 118 other metabolic process ("lipid metabolic process" and "carbohydrate metabolic 119 process"). 120

DNA methylation in worker genes 122
Whole bisulfite sequencing (WBS) from B. terrestris and T. angustula nurses 123 were used to screen DNA methylation patterns in the entire transcriptome and among 124 the differentially expressed superTranscripts. Because T. angustula lacks a reference 125 genome and most DNA methylation reported in bees occur within gene exons 14 , we 126 performed methylation analyses by mapping bisulfite sequenced reads to the 127 https://github.com/nat2bee/Foragers_vs_Nurses) . In B. terrestris 23.14 % of all 129 cytosine sites are in CG (cytosine/guanine) context. This is a higher proportion than in 130 T. angustula where 15.44 % of all C sites available occur in CG context. We find this 131 explains the higher proportion of CG methylation observed in the bumblebee ( Figure  132 1). Nevertheless, in both species DNA methylation in CG context was enriched, that 133 is there was more DNA methylation in CG context than it would be expected simply 134 based on the proportion of sites available. Furthermore, superTranscripts general 135 methylation (mC) levels are higher in T. angustula (mean mC 1.24 %) than in B.  of methylation than the overall transcriptomic mean (Figure 2), however only in B. 149 terrestris this difference was significant (B. terrestris p = 6.267e-4, T. angustula p = 150 0.3669 at 95% CI). Interestingly, while in B. terrestris this increase was mostly due to 151 the greater methylation level of superTranscripts highly expressed in nurses; the mean 152 mC level of the highly expressed superTranscripts in B. terrestris nurses was 43.93% 153 higher than the global transcriptomic mean (p = 1.339e-06 at 95% CI). In T. 154 angustula superTranscripts highly expressed in foragers were the more methylated 155 ones (Figure 2), although still not at a significant level when compared to the general 156 mean (p = 0.05355 at 95% CI). The nucleotide context in which the methylated 157 cytosines occurred also varied in each gene subset (Figure 1). There was an overall 158 reduction in the contribution of CG methylation in the subset of differentially 159 expressed superTranscripts when compared to the entire transcriptome, except for 160 superTranscripts highly expressed in B. terrestris nurses ( Figure 1D In order to recognize species-specific from shared molecular mechanisms, 185 different strategies were used. First, we asked whether the exact same genes were 186 commonly involved in the observed subcaste differences of T. angustula and B. 187 terrestris. Comparing the two sets of differentially expressed superTranscripts we 188 identified 15 genes in common (Table I; Figure 3C), which is significantly more than 189 it would be expected by chance (p = 6e-04, mean number of genes expected 7.04, 190 SD=2.58). Interestingly, the expression pattern of these genes was not always 191 equivalent in both species (Table I) Secondly, we investigated whether the same molecular pathways could be 202 involved in the task division of the two species. For this, we searched for similarities 203 among the biological processes to which the differentially expressed superTranscripts 204 were related. We used a comparative approach based on GO subgraphs of the 205 enriched terms. This type of subgraph relies on the hierarchical graphic structure 206 among GO terms, where parent terms are more general and less specialized than child 207 terms 26,27 . Consequently, using subgraphs it is possible to compare not only the 208 enriched terms themselves but also their hierarchical connections, reducing gene 209 annotation bias 28     These hormones are important regulators in honeybee maturation affecting task division system in workers 29 . In honeybees, foragers have higher levels of JH than nurses 4,5,29 but in primitively eusocial bees, changes in JH appear not to affect worker behaviour 22 . This led to the hypothesis that JH might only be involved with age related task division 30,31 . In the present dataset we did not find any direct evidence of the involvement of JH in age related task division of T. angustula workers. This agrees with previous studies about JH in stingless bees, which have demonstrated that JH expression differences are important in differentiating queens and workers but not nurses and foragers, although titter levels of JH are significantly reduced in foragers 32 . One transcript in our dataset, highly expressed in B. terrestris foragers, was indirectly related to JH pathways, a gene predicted as "takeout-like". This gene family has been associated with multiple processes in insects, including eusocial insects, in which it has been shown to be strongly sensitive to queen pheromone 33 .

vitellogenin (vg)
This yolk precursor protein is related to egg production in many insects 34 .
In honeybees it interacts with JH in a double repressor network, and its expression is reduced in foragers 4,5,34 . For bumblebees this double repressor network apparently does not exist, instead this protein gene has been associated with worker aggression 22 and reproductive status when expressed in the fat body 35 . In our B. terrestris data, two genes highly expressed in foragers have vg transcription factor domains. As a primitively eusocial species, bumblebee workers may dispute reproductive status with queens in later stages of the colony cycle 36 . Therefore, it would be interesting to further investigate if the higher expression of these vg associated genes in foragers could be related to this behaviour. In T. angustula, we found in nurses a higher expression of one vg receptor gene indicating the relevance of this protein in this subcaste, as in honeybees. Nevertheless, it is worth noticing that stingless bees workers usually produce trophic eggs 37 , so vg might be involved in this process or even have alternative unknown roles, as suggested in 38 .
¯ For Nur

foraging (for)
Although this gene has been reported as highly expressed in honeybees 39 and bumblebees 23 foragers, there are controversial results in literature 40 about its effects. In honeybees, this gene expression was not among the best predictors of work behavioural transition 5,7 and in bumblebees its expression was higher in nurses than foragers in one study 21 . Herein, this gene was not differentially expressed in the studied species.

period (per) / circadian rhythm
The gene period is related to circadian rhythm and has been reported as overexpressed in honeybee foragers 41,42 . This specific gene does not appear among the ones differentially expressed. However, B. terrestris foragers have other highly expressed rhythm genes (protein quiver or sleepless) that are related to sleep, rhythmic process, and regulation of circadian sleep/wake cycle. Conversely no rhythm gene was associated to differentially expressed superTranscripts of T. angustula. This suggests that in B. terrestris, and in Apis, rhythm genes are more relevant to nurse/forager behavioural differences than in T. angustula.

Insulin / Insulin-like signalling (IIS)
Genes involved in this pathway are important regulators of metabolism and feeding-related behaviour in bees and other insects 40,43,44 . In Apis mellifera, this energetic pathway is related to subcaste division of workers and with lipid storage (lower levels of lipid storage increase IIS gene expression) 43 .
In both species studied here, there are differentially expressed genes between nurses and foragers related to insulin metabolism (genes containing insulin domains, transcription factor and regulators). This indicates that the insulin pathway is commonly important to worker subcaste specialization in all these eusocial bees.

Energetic metabolism
In general, genes related to energetic metabolism are expected to be involved in bee worker behaviour because feeding circuits are basal pathways to different bee activities 40,45 . Indeed, many genes related to energetic metabolism are differentially expressed between nurses and foragers of both species, being some of the GO terms commonly found in both species related to this pathway. Specific examples of genes involved in energetic pathways (besides JH and IIS) studied in honeybees are malvolio and major royal jelly proteins 46,47 . The first was not differentially expressed in our data; the second was related to differentially expressed superTranscripts in B. terrestris; in B. terrestris nurses, two highly expressed genes were predicted as protein yellow genes (which have a major royal jelly protein family domain), and in foragers other two over expressed genes had major royal jelly protein family domains.  16,17 . In the two species investigated in the present study genes possibly related to epigenetic changes were also differentially expressed. In T. angustula histone genes (H3 and H2B) and a methyltransferase were differentially expressed and in B. terrestris histone H3-K4 demethylation and lncRNAs were reported. All these genes were highly expressed in nurses, except for one lncRNA over expressed in B. terrestris foragers.

260
Our comparisons sought to differentiate species-specific from common 261 For T. angustula we found supporting evidence of the involvement of vg in nursing 284 behaviour, nonetheless JH genes were not highly expressed in foragers. This supports 285 the hypothesis that in stingless bees the typical vg/JH double repressor network 286 observed in honeybees is also not functional, and vg is distinctly regulated 32,38 . 287 Nevertheless, gene expression dynamics in worker behaviour is not 288 completely unrelated across eusocial bees. Beyond the literal expression trend, we 289 still found a significant number of genes commonly differentially expressed between 290 nurses and foragers of B. terrestris and T. angustula. Interestingly, some of these 291 common genes were also shown to be responsive to queen pherormone 33 . Genes like 292 cytochrome p450, fatty acyl-CoA, mitochondrial and histone related were found to be 293 sensitive to queen pheromone in ants and bees 33 . Moreover, biological processes 294 terms enriched in each species set of differentially expressed superTranscripts were 295 highly comparable. Our comparisons of enriched GO terms subgraphs highlighted 296 broader similarities between B. terrestris and T. angustula, indicating how distinct 297 GO terms (and genes) were involved in similar biological processes. In general, 298 biological terms related to energetic and metabolic processes ("organic substance 299 metabolic process"; "primary metabolic process"; "nitrogen compound metabolic 300 process"; and "cellular metabolic process") were central to subcaste differentiation of 301 both species. 302 The relevance of metabolic pathways to insect sociality has been demonstrated 303 in many studies over the years 45,50-52 and it is most certainly not a species-specific 304 trait. These pathways are affected by queen pheromone in different species and are 305 involved with caste determination of multiple hymenopteran lineages, including bees, 306 ants and wasps 33,53 . Given the central role of energetic and metabolic maintenance in 307 any living animal it is not surprising that changes in these pathways will affect a 308 number of features, including behavioural phenotypes. It is however fascinating to 309 observe how plastic and dynamic, in terms of gene regulation, these networks can be, Then, for T. angustula, nurses were defined by age. Brood cells (close to emergency) 422 were removed from the colonies and transferred to an incubator with controlled 423 temperature and humidity. Upon emergency, female workers were marked with 424 specific colours using a water-based ink and immediately returned to the colony. Ten 425 to twelve days after their emergency and reintroduction, colonies were opened and 426 marked individuals were sampled. During this age worker bees from T. angustula 427 present nursing behaviour 37 . Foragers were collected while leaving and returning to 428 the colonies from foraging trips. To prevent sampling of guard workers 2 , bees 429 standing in front of the colony entrance were avoided. Some of the foragers were 430 collected before nurse sampling and others after this period, but no foragers were 431 sampled while nurses were marked and collected so as to avoid effects of colony 432 disturbance in the worker behaviour. Nurses from different colonies were collected in 433 different days. 434 All individuals were sampled between 10h-12h for both species, and entire 435 worker bodies were used for RNA and DNA extraction. For RNA-Seq, six T. Transcriptome assembly and differential expression analyses and comparisons 454 (v0.11.2) before and after cleaning. The FASTX Toolkit 67 (v0.0.14) was used to trim 456 the first 14 bp of all reads because an initial GC bias 68 was detected. Low quality 457 bases (phred score below 30) and small reads (less than 31 bp) were removed using 458 independently and compared posteriorly, as illustrated in Figure S6 was performed using the R package TopGO 28 . Species comparisons of differentially 500 expressed genes was based on gene annotation, using only unique and non-redundant 501 terms (i.e. those genes not containing "uncharacterized protein" in their annotation). 502 The list of overlapping genes was then manually curated to remove annotation 503 incoherencies not detected computationally, i.e. when gene lists from both species 504 were compared with our R script 18 terms were common, after manual curation we 505 removed three genes from this list because of partial or redundant annotation matches 506 ("transposase", "transporter" and "cytochrome c oxidase subunit [fragment]"), leaving 507 15 genes in common. In the random sampling statistics this manual filtering 508 correction was not used, so the numbers of common genes obtained with the 509 computational comparison were used. Comparisons between the set of GO enriched 510 terms and subgraphs was manual. The similarity network parameters was estimated 511 with REVIGO 87 using Medium (0.7) similarity threshold. In the interactive network 512 mode of this program, the input data for Cytoscape 88 was downloaded for further 513 figure edition. Statistical tests of significance for comparisons were based on random 514 sampling using R 89 scripts, p-value smaller than 0.01 were considered significant. 515 Scripts used are available at https://github.com/nat2bee/Foragers_vs_Nurses. 516

DNA methylation analysis 518
Cleaning and adapter trimming of the bisulfite converted reads were 519 performed using Trim Galore 90 (v 0.4.3) wrapper script with default parameters. 520 Complete transcriptome assemblies were used as reference so DNA methylation of 521 coding regions could be analysed, since these regions are the main methylation targets 522 in bees and other Hymenoptera 14 . PCR bias filtering, alignment of the cleaned reads 523 and methylation call were performed using the BS-Seeker2 91 (v 2.1.0), because this 524 program allows the use of Bowtie2 in local alignment mode, which was necessary to 525 properly align WBS reads to a transcriptome. CGmapTools 92 (v 0.0.1) was used to 526 filter low coverage methylated sites (< 10x) and to obtain DNA methylation statistics, 527 including context use. Remaining statistical tests were performed using R, as follows: 528 a random sampling test was used to verify whether the proportion of CG methylation 529 found deviated from what was expected by chance; one-tailed z-test was used to test 530 whether differences between the mean methylation observed in the set of 531 superTranscripts was different from the general transcriptomic mean; and the 532 correlation between methylation and gene expression was calculated using 533 Spearman's correlation coefficient between the superTranscript mean methylation and 534 its normalized read count. Scripts used are available at 535 https://github.com/nat2bee/Foragers_vs_Nurses. 536 biology of social behavior. Bioessays 19, 1099-1108 (1997).