Meta-analysis of public RNA sequencing data of queens and workers in social Hymenoptera and termites

Simple Summary: Social insects form complex, sophisticated societies in which members work cooperatively. Ants, bees, wasps, and termites are examples of social insects. The key characteristic of their structured societies is the reproductive division of labor. The queens reproduce within the nests, whereas workers engage in other tasks for colony development. Although gene expression profiles associated with the reproductive division of labor have been revealed for each social spe-15 cies, the existence of core regulatory genes commonly shared by all or multiple social insects is largely unknown. To address this issue, we conducted a meta-analysis by collecting and reanalyz-17 ing massive amounts of data from social bees, social wasps, ants, and termites. We identified 20 genes with queen- and worker-biased expression. Twelve of these genes have not been previously associated with reproductive division of labor. The expression of these genes is likely necessary for social insects to exhibit reproductive division of labor. Abstract: Eusociality in insects has evolved independently many times. One of the most notable characteristics of eusociality is the reproductive division of labor. In social insects, the reproduc-24 tive division of labor is accomplished by queens and workers. Transcriptome analyses of queens and workers have been conducted for various eusocial species. However, the genes that regulate 26 the reproductive division of labor across all or multiple eusocial species have not yet been fully 27 elucidated. Therefore, we conducted a meta-analysis using publicly available RNA-sequencing da-28 ta from four major groups of social insects. In this meta-analysis, we collected 258 pairs (queen vs. 29 worker) of RNA-sequencing data from 34 eusocial species. The meta-analysis identified a total of 30 20 genes that were differentially expressed in queens or workers. Out of these, 12 genes have not 31 previously been reported to be involved in the reproductive division of labor. Functional annota-32 tion of these 20 genes in other organisms revealed that they could be regulators of behaviors and 33 physiological states related to the reproductive division of labor. These 20 genes, revealed using massive datasets of numerous eusocial insects, may be key regulators of the reproductive division 35 of labor.


39
Organisms have undergone several evolutionary transitions from cellular to multi-40 cellular organisms and their organized societies [1]. Sociogenomics aims to comprehen-41 sively understand social life from a molecular perspective [2]. Eusociality is the most ad-42 vanced social form, characterized by a high level of cooperation among members of so-43 ciety. An essential characteristic of eusociality is the reproductive division of labor, 44 which is accomplished by reproductive and non-reproductive castes. Reproductive 45 castes are physiologically and behaviorally specialized in reproduction, whereas non-46 reproductive castes engage in brood care or foraging. The division of labor is believed to 47 increase the overall productivity of eusocial colonies [3].
Eusociality, which is based on the reproductive division of labor, has evolved inde-49 pendently in several insect lineages [4,5]. Eusocial Hymenoptera (bees, ants, and wasps) 50 and termites are extensively studied insects that have complex and sophisticated socie-51 ties. The reproductive and non-reproductive castes of these social insects are referred to 52 as queen/king and workers/soldiers, respectively [6]. The investigation of genomic 53 changes underlying the convergent evolution of eusociality among several lineages (bees 54 and ants) has identified the genes that molecular evolution has occurred with increased 55 social complexity [7][8][9][10]. Transcriptome analysis using the pharaoh ant Monomorium 56 pharaonis and the honeybee Apis mellifera revealed that convergent eusocial evolution is 57 based on the expression of highly conserved reproductive and lineage-specific genes [11]. 58 In addition, many transcriptome analyses of queens and workers have been conducted 59 for each species of bee, ant, wasp, and termite . However, no comprehensive tran-60 scriptome analysis has been performed to understand the regulatory mechanisms of re-61 productive division of labor across a wide range of eusocial species. 62 A meta-analysis that combines transcriptomic data obtained from multiple studies 63 is effective identifying novel gene expression associated with specific biological process-64 es [33][34][35]. Additionally, it is possible to identify master regulatory genes by combining 65 transcriptome data from a wide range of species. We developed meta-analysis methods 66 for publicly available transcriptome data from humans, plants, and insects [33,[36][37][38][39]. In 67 this study, we used a meta-analysis method to identify regulatory genes related to the 68 reproductive division of labor in ants, bees, wasps, and termites. We collected RNA se- 69 quencing data for 258 pairs (queens vs. workers) from 34 eusocial species. Meta-analysis 70 detected 212 genes that were conserved in all species examined in this study, 20 of these 71 212 genes were significantly differentially expressed between queens and workers. 72 73 74 2.1. Curation of public RNA sequencing data 75 Accession IDs (Project ID) related to eusociality were searched from the public 76 databases All Of gene Expression (AOE) [40] and DBCLS SRA (https://sra.dbcls.jp/ 77 (accessed on 11 May 2022)) using the keywords caste, eusocial, eusociality, and sociality. 78 Metadata was obtained using NCBI SRA Run Selector 79 (https://www.ncbi.nlm.nih.gov/Traces/study/ (accessed on 12 May 2022)). The 80 acquisition of RNA sequencing data was limited to species for which the genome or 81 reference transcriptome was available. RNA sequencing data from each species were 82 obtained in pairs (queen and worker) to investigate gene expression related to the 83 reproductive division of labor. 84 2.2. RNA-seq data retrieval, processing, and quantification 85 SRA retrieval and conversion to FASTQ files were performed using the prefetch 86 and fasterq-dump programs in the SRA Toolkit (v3.0.0) in the Docker environment 87 (v3.3.1) [41]. To decrease the disk space usage, the obtained FASTQ files were immedi-88 ately compressed using pigz (v. 2.4). For trimming and quality control, Trim Galore! 89 (v.0.6.6) [42] and Cutadapt (v.3.4) [43] software were used. Trim Galore! was run with 90 the parameter fastqc --trim1--paired (if data were paired-end). Trim Galore! included 91 Cutadapt (v.3.4) to trim low-quality base calls. Because all the data passed through this 92 quality check, low-quality data were not included in this study. Biological replicates 93 were treated as one experiment, and technical replicates were combined into one expres-94 sion dataset. 95 The files of the reference genome and transcripts used for quantifying gene 96 expression levels are listed in Transcript expression was quantified using Salmon (v.1.5.0) [44] with the parameters -i 100 index -l IU. Gene expression levels were quantified using transcripts per million (TPM). 101 For species with only genome sequences, RNA sequencing reads were mapped to the 102 reference genome using HISAT2 (v.2.2.1) [45] with the parameters -q --dta -x, and then 103 quantification of the mapped data was performed using StringTie (2.1.5) [46]. TPM was 104 used as a quantitative value of the gene expression level. 105 2.3. The detection of genes related to the reproductive division of labor 106 A calculation method developed for the meta-analysis of RNA sequencing data was 107 employed [33,36-39] as described below. Pairs of the queen and worker RNA sequenc-108 ing data (Table S2: https://doi.org/10.6084/m9.figshare.21524286.v1) were used to calculate 109 the queen/worker ratio of TPM for each transcript (referred to as the QW ratio). The QW 110 ratio for each transcript was calculated using Equation (1).

Materials and Methods
This pairwise calculation was performed for all the pairs described in Table S2 112 (https://doi.org/10.6084/m9.figshare.21524286.v1). The value "0.01" was added to TPM to 113 convert zero into a logarithm. QW ratios were classified as "upregulated," "downregu-114 lated," or "unchanged" according to the thresholds. Transcripts with a QW ratio greater 115 than 0.58496250072116 (1.5-fold expression change) were treated as "upregulated." 116 Transcripts with QW ratios less than -0.58496250072116 (1/1.5-fold expression changes) 117 were treated as "downregulated." Others were treated as "unchanged." The QW score 118 was calculated to identify the genes involved in the reproductive division of labor. The 119 QW score was calculated by subtracting the number of pairs with "downregulated" 120 from those with "upregulated, " as shown in Equation (2):

121
QW score = count number *+,-.*/01-2 − count number 2345,-.*/01-2 , To determine homologous relationships, the protein BLAST (BLASTP) program in 126 the NCBI BLAST software package (v2.6.0) was used with parameters -evalue 1e-10 -127 outfmt 6 -max_target_seqs 1. The transcript files used for protein translation are listed in 128 Table S1 (https://doi.org/10.6084/m9.figshare.21524286.v1). For species with only ge-129 nome sequences, GFFread (v0.12.1) [47] was used to extract transcript sequences from 130 the genome sequence. TransDecoder (v5.5.0) (https://transdecoder.github.io/ (accessed 131 on 22 Jun 2021)) was used to identify and translate the coding regions into amino acid 132 sequences. Open reading frames were extracted using TransDecoder.Longorfs with pa-133 rameters -m 100, and coding regions were predicted using TransDecoder.Predict with 134 the default parameters. As a result of the above processes, the transcript IDs of all spe-135 cies were converted to the protein IDs of A. mellifera. The protein IDs of A. mellifera were 136 then converted to those of other species using the annotation table for transcripts of A. 137 mellifera [48]. Transcripts and amino acid sequences created in this study can be down-138 loaded from figshare (Supplemental File S1). GSEA was performed using the Metascape 139 software [49]. sect species ( Figure 1A, Table S2; https://doi.org/10.6084/m9.figshare.21524286.v1). Ants 147 (Harpegnathos saltator and Monomorium pahraonis), termites (Cryptotermes secundus), hon-148 eybees (A. mellifera), and wasps (Polistes dominula) were among the higher ranked species, 149 indicating that RNA sequencing data from a wide range of eusocial species were includ-150 ed in the dataset. When compared among the tissues, brain-derived RNA sequencing 151 data were the most abundant ( Figure 1B).  The expression ratio (queen vs. worker) (ratio) was calculated for each species. At 157 this point, the transcript IDs in the QW ratio table differ from species to species, even for 158 homologs. Therefore, we converted the transcript IDs of each species to the protein IDs of 159 A. mellifera, and then merged the QW-ratio tables of each species. We obtained QW-ratio 160 values for 212 genes in 258 pairs of RNA sequencing data (Table S3: 161 https://doi.org/10.6084/m9.figshare.21524286.v1). The QW score for each gene was then 162 calculated based on the QW ratio table. The QW score for each gene was calculated by 163 subtracting the number of pairs with downregulated (less than 1/1.5-fold change) gene 164 expression from the number of pairs with upregulated (more than 1.5-fold change) gene 165 expression. This method provides the overall trends of gene expression changes in 258 166 pairs of RNA sequencing data. High scores indicate highly expressed genes in the queen, 167 while low scores indicate highly expressed genes in the workers. The 212 identified genes 168 were ranked based on their QW-score values (  Table S4 because they showed abrupt score changes 174 ( Figure 2 and Table 1). Abrupt score changes indicated that their expression was remark-175 ably variable in the dataset used in this study.  To estimate the function of these genes, the information of tissue-expression profile in D. 184 melanogaster were added (Table 1). Nine out of ten highly expressed genes in workers 185 were found primarily in the brain and thoracoabdominal ganglia in D. melanogaster, 186 while highly expressed genes in queens were found in various tissues. If high expression 187 has already been reported in queens or workers, caste and species names are noted. In 188 addition, we added the biological/molecular functions predicted by the cited literature. 3.3. GSEA 190 We selected the top 30 best-ranked and worst-ranked genes based on the QW score, 191 and then performed GSEA using the corresponding IDs of D. melanogaster and H. sapiens 192 (Figure 3). No GO terms with low p-values were detected when D. melanogaster IDs were 193 used ( Figure 3AB). However, when H. sapiens IDs were used, rRNA processing was sig-194 nificantly enriched in the top 30 genes ( Figure 3C), whereas the positive regulation of ion 195 transport and modulation of chemical synaptic transmission were enriched in the last 30 196 genes ( Figure 3D).  In this study, we aimed to identify genes that regulate the reproductive division of 202 labor across all or multiple eusocial insects by performing a meta-analysis of RNA-seq 203 data. Meta-analysis revealed 20 genes with differential expression between queens and 204 workers. Among these genes, vitellogenin and vitellogenin receptors, which are highly 205 expressed in queens across many social insects [29,[50][51][52][53][54][55][56], showed the highest QW score 206 (Figure 1). This suggests that the current meta-analysis method is suitable for identifying 207 differences in gene expression between queens and workers.

208
Among the genes identified in this study, we focused on 20 genes with the highest 209 or lowest QW scores (Table 1). As the results of this study represent a vast amount of da-210 ta from a wide variety of eusocial species, it is likely that these genes are the key regula-211 tors underlying the reproductive division of labor in many social insects. In addition, 212 several of the 20 genes were not previously implicated in the reproductive division of 213 labor, providing novel insights into the regulatory mechanisms of the reproductive divi-214 sion of labor. In the following sections, we summarize the biological and molecular func-215 tions of the corresponding homologs of these 20 genes in other organisms and discuss 216 their role in the regulatory mechanisms of the reproductive division of labor. Vitellogenin is a precursor protein of the egg yolk that is taken up by oocytes via re-221 ceptor-mediated endocytosis [57,58]. These genes are highly expressed in the reproduc-222 tive castes of various insects [29,[50][51][52][53][54][55][56]. JH generally acts as a gonadotropic hormone in 223 eusocial and solitary insects [59][60][61][62][63]. Cyp305a1, which encodes JH epoxidase, is required 224 for JH biosynthesis in D. melanogaster [64]. Apolpp is a JH-binding protein in the cock-225 roach Blattella germanica [65], and is upregulated in the termite queens, Reticuliterms sper-226 atus [30]. These genes may contribute to queen ovarian development via an upregulated 227 JH titer. However, in the honeybees A. mellifera and the ant H. saltator, the gonadotropic 228 function of JH was lost [66,67]. The gonadotropic effects of these genes may depend on 229 species.

230
Exuperantia is a maternal factor that is involved in oocyte formation [68]. In social in-231 sects, it is reported to be expressed in the reproductive workers of the honeybee A. mel-232 lifera [69] and queens of the ant Temnothorax longispinosus [70]. This gene is likely in-233 volved in the formation of normal oocytes in both social and solitary insects.

236
CG1671 was orthologous to human transducin beta-like 3 (TBL3/UTP13). UTP-13, 237 which is a component of the UTP-B complex, binds to pre-rRNA to form 90S pre-238 ribosomes, followed by pre-RNA cleavage [71]. TFIIE, a general transcription factor, is 239 required for initiation of transcription by RNA polymerase II. Additionally, TFIIE affects 240 ribosomal assembly and function [72]. Defects in ribosomal synthesis or function are im-241 plicated in various disease states known as 'ribosomopathies' in humans [73]. Conse-242 quently, CG1671 and TFIIEalpha may affect ribosomal synthesis and performance, which 243 may influence the physiological state of the queen.

244
Genes involved in rRNA processing were enriched in the results of GSEA, support-245 ing the importance of rRNA processing as a regulator of the reproductive division of la-246 bor. In this study, SPARC was highly expressed in queens, and RSG7 was highly ex-251 pressed in workers. In mice, SPARC promotes insulin secretion via downregulation of 252 regulator of G protein 4 (RGS4) expression in pancreatic β cells [74]. It also may regulate in-253 sulin secretion in eusocial insects via downregulation of RSG7 expression. The expression 254 of insulin-like peptide (ILP) is upregulated in the queens of several ants and in the ter-255 mite Macrotermes natalensis [75,76]. In A. mellifera, however, ILP expression is lower in the 256 old queen than in the old workers [77]. Therefore, the relationship between SPARC and 257 RGS may possibly not apply to all eusocial species. l(2)37Cb is orthologous to human DEAH-Box Protein 16 (DHX16), which encodes 261 an ATP-dependent RNA helicase that is recruited to Complex B in the spliceosome [78]. 262 The CG31712 protein interacts with the spliceosome complex, including l(2)37Cb [79]. 263 These genes may be related to differences in alternative splicing patterns between 264 queens and workers. 265 266 4.5. Immunoglobulin superfamily (beat and dpr1) 267 Immunoglobulin superfamily proteins have been implicated in neuron-neuron and 268 neuron-muscle interactions [80]. dpr1 (defective proboscis extension response 1) is re-269 quired for salt aversion in D. melanogaster [81]. beat (beaten path Ib) is necessary for the 270 formation of normal nerve branches in the muscle [82]. In contrast to queens, workers ac-271 tively engage in foraging and brood care. These genes may be involved in modifying the 272 nervous system required for workers to engage in the colony task. Syt1, which encodes a synaptic vesicle calcium-binding protein, is required for syn-276 aptic vesicles to interact with the plasma membrane [83]. Ca2 + ions act on molecular 277 complexes, including Syt1, triggering rapid exocytosis at the synapse. Syt1 is highly ex-278 pressed in ants (S. invicta), honeybees (A. mellifera), and wasps (P. metricus) [84]. Syt1 279 might regulate the secretion of neurotransmitters that cause foraging and brood care. 280 Serotonin secretion regulates worker responsiveness to trail pheromones in the ant Phei-281 dole dentata, leading to cooperative foraging behavior [85].

282
Amon, which encodes a member of the prohormone convertase family, is required 283 for propeptide processing to form bioactive neuropeptides [86]. In D. melanogaster, amon 284 protein is an important enzyme that catalyses many neuropeptides in the larval corpora 285 cardiaca and perineuronal organs of the thorax and abdomen. Corazonin is a neuropep-286 tide whose production is dependent on ammonia catalysis [86]. Corazonin is a neuropep-287 tide that is highly expressed in the workers of three ants (Harpegnathos saltator, Campono-288 tus floridanus and Monomorium pharaonis) and one wasp (Poistes canadensis) [15]. In addi-289 tion to corazonin, amon may regulate the production of neuropeptides that regulate 290 worker behavior. qvr, which encodes a glycosylphosphatidylinositol (GPI)-anchored membrane pro-294 tein, is a sleep-promoting factor in D. melanogaster [87]. Gat (GABA transporter) encodes a 295 protein belonging to the solute carrier transporter family. Mutations in Gat cause exces-296 sive sleep because of the reduced uptake of GABA (a sleep inhibitor) from synapses to 297 astrocytes in D. melanogaster [88]. Gat and quiver are reported to be highly expressed in 298 worker foragers of the bumblebees B. terrestris and A. mellifera, respectively [69,89]. In 299 honeybees, A. mellifera, sleep deprivation negatively affects the accuracy of the waggle 300 dance, which informs colony members of the location of food resources [90]. It is possible 301 that the amount of sleep of workers was optimized by Gat and quiver. It is unknown what 302 effect excessive sleep has on the task performance of workers, but it is believed that the 303 longer workers sleep, the less time they spend foraging and providing brood care for the 304 colony.

305
CG30158 is homologous to mammalian Dexras1, which encodes a Ras-like G protein. 306 Dexras1 is required for entrainment of circadian rhythms in the environmental light cycle 307 in mice [91,92]. In worker foragers of the honeybee, A. cerana japonica, the circadian 308 rhythm of locomotion activity is synchronized with light and temperature [93]. The A. 309 mellifera foragers are often inactive at night [94]. In addition, the sleep rhythms of A. mel-310 lifera foragers are also known to be synchronized with the timing of the food available 311 [95]. In social insects, the homolog of CG30158 may play an important role in synchroniz-312 ing foraging timing with environmental cycles.

Characteristics of genes highly expressed in workers 315
Tissue expression profiles in D. melanogaster were referred for 20 listed genes using 316 FlyAtras2 [96] (accessed on 14 Oct 2022). Unlike the genes significantly expressed in 317 queens, many of the gene sets upregulated in workers were expressed in the nervous sys-318 tem of adult D. melanogaster (Table 1). Enrichment analysis showed that genes involved 319 in the modulation of chemical synaptic transmission (GO:0050804) were enriched in the 320 gene set that is highly expressed in workers, providing evidence for active gene expres-321 sion in the nervous system of workers.

324
This study is the first meta-analysis conducted on massive datasets of numerous 325 eusocial species in two phylogenetically distant lineages, social Hymenoptera and ter-326 mites, and provides two novel findings. First, the meta-analysis found 12 genes that had 327 not been reported to be associated with the reproductive division of labor. This is be-328 cause meta-analysis removes the data bias that occurs in the results of individual studies. 329 Second, the 20 genes retrieved from massive datasets of a large number of eusocial spe-330 cies may be key regulators of the reproductive division of labor. Functional analyses of 331 the genes identified in this study are expected in the future. As there is increase in the 332 publicly available RNA sequencing data, meta-analyses should prove useful in provid-333 ing new insights into targeted biological phenomena, despite the fact that meta-analysis 334 of RNA sequencing data is hardly widespread in entomology. The present study 335 demonstrated the effectiveness of a meta-analysis of RNA sequencing data.      Comparative