Gene co-expression networks identify novel candidate genes for moulting and development in the Atlantic salmon louse (Lepeophtheirus salmonis)

Background The salmon louse (Lepeophtheirus salmonis) is an obligate ectoparasitic copepod, living on Atlantic salmon and other salmonids in the marine environment. Salmon lice cause a number of environmental problems and lead to large economical losses in aquaculture every year. In order to develop novel parasite control strategies, a better understanding of the mechanisms of moulting and development of the salmon louse at the transcriptional level is required. Methods Three weighted gene co-expression networks were constructed based on the pairwise correlations of salmon louse gene expression profiles at different life stages. Network-based approaches and gene annotation information were applied to identify genes that might be important for the moulting and development of the salmon louse. RNA interference was performed for validation. Regulatory impact factors were calculated for all the transcription factor genes by examining the changes in co-expression patterns between transcription factor genes and deferentially expressed genes in middle stages and moulting stages. Results Eight gene modules were predicted as important, and 10 genes from six of the eight modules have been found to show observable phenotypes in RNA interference experiments. We knocked down five hub genes from three modules and observed phenotypic consequences in all experiments. In the infection trial, no copepodids with the RAB1A-like gene knocked down were found on fish, while control samples developed to chalimus-1 larvae. Also, a FOXO-like gene obtained highest scores in the regulatory impact factor calculation. Conclusions We propose a gene co-expression network-based approach to identify genes playing an important role in the moulting and development of salmon louse. The RNA interference experiments confirmed the effectiveness of our approach and demonstrated the indispensable role of RAB1A-like gene in the development of salmon louse. In addition to salmon louse, this approach could be generalized to identify important genes associated with a phenotype of interest in other organisms.

Background 1 biological systems [21]. Recently, a great deal of attention has been devoted to the 45 area of network-based analysis. Network analysis provides a powerful framework for studying a large number of interactions among biological processes and components. 47 Gene co-expression networks (GCNs) have been widely used to capture and mine 48 the interactions among components of the transcriptome [21,22]. 49 Signatures of hierarchical modularity have been suggested to be present in all 50 cellular networks investigated so far, ranging from metabolic to protein-protein in- "important genes", and we proposed a workflow to identify these important mod-module was computed as the eigenvector for the largest eigenvalue of the module 129 gene expression matrix by the function moduleEigengenes in WGCNA. 130 Intramodular centrality measurements and intramodular hub identification 131 In this study, we adopted three types of centrality measurements to measure the 132 centralities of nodes within each module and identified intramodular hubs. 133 Intramodular Connectivity (kIM) 134 The connectivity of the ith node (k i ) in the weighted network is defined as the sum 135 of connection weights between node i and the other nodes [37]: ( Suppose that there are Q modules detected in a network, and they are labeled 137 by q = 1, 2, . . . Q, so the connectivity of a node i within a module q is defined as 138 intramodular connectivity (k where M q denotes the set of node indices that correspond to the nodes in module 140 q, and A (q) is the adjacency matrix of module q. High intramodular connectivity 141 implies that a node could be a hub within the module.
In addition, the module membership of a node for module q can be calculated for 152 all nodes in the network: 153 kM Eall and this definition can be used in the module preservation analysis. The details can 154 be found in Additional file-1.
155 Intramodular weighted betweenness centrality (BC) 156 The betweenness centrality of a node in an unweighted network (or module) is the We used the R package glmnet [43] to perform this analysis, and we adopted the 197 λ that gives minimum mean cross-validated error.
Integrating information from external databases and enrichment analysis 199 Data from FlyBase [44,45] and GenomeRNAi [46] were extracted and used to iden-200 tify homologous observable phenotypes and lethal phenotypes enriched modules. 201 To detect homologous sequences in Drosophila melanogaster, we ran BLASTP 202 with E-value cutoff as 1e − 10 on the corresponding protein sequences of salmon 203 louse transcripts against protein sequences from Drosophila. Only best hits were 204 considered. After mapping the protein IDs of the homologues from Drosophila to 205 gene IDs, RNAi knock-down phenotype information were mapped to data from 206 GenomeRNAi. If a salmon louse protein had more than one Drosophila homologue 207 with identical maximum bitscore, all the homologues were used to search for RNAi 208 phenotypes. BLASTP searches of all salmon louse predicted amino-acid sequences 209 were performed to find paralogues.  with an adjusted p-value smaller than 0.05 were identified as differentially expressed 257 between middle instar ages and old/moulting instar ages.

258
The first RIF value (RIF 1) for f th TF transcript was defined as: where ro if and rm if are the co-expression correlation between f th TF and the ith 265 DE transcript in the old/moulting samples and the middle samples, respectively.

266
The second RIF value (RIF 2) was computed as followed: ica RM 2165) and stained with toluidine blue (1% in 2% borax) for one minute.

346
Identification of moulting-associated and TF genes 347 Among the transcripts in our RNA-seq data, we found 40 moulting-associated  Table 1.

351
There were 433 transcripts annotated as TF, and 231 TF were retained after low 352 expression filtering (Additional file 2- Table S8).

354
To detect the genes that might be involved in the regulation of moulting and devel-  ing network, and global network, respectively. 366 We set the power parameter β as 7 to make sure that the networks satisfied  and EMLSAT00000012651) (Additional file 2- Table S6). Notably, the moulting-402 associated transcripts were also identified as intramodular hubs in these modules.

403
The transcript EMLSAT00000008812 (annotated to encode chitinase, Table 1) was 404 ranked eighth (based on connectivity) in the "yellowgreen" module, and the tran-405 script EMLSAT00000012651 (annotated as EcR, Table 1) was ranked third (based 406 on betweenness centrality) in the "lavenderblush3" module. We thus hypothesized 407 that transcripts in these two modules could be important for salmon louse moulting, 408 and hubs from these modules should be considered as important.

409
Eight modules from the middle network were identified as non-preserved, and 410 the module sizes ranged from 54 to 109 (Additional file 2- Table S2). Three non-411 preserved modules (darkseagreen4, brown4 and lightcyan1) were found containing 412 one moulting-associated transcript (Additional file 2- Table S5). However, none of 413 these moulting-associated transcripts were intramodular hubs in the middle net-  Table S4). It was noteworthy that module "steelblue" possessed the 425 largest positive coefficient and contained one moulting-associated transcript (EML-426 SAT00000001150) as intramodular hub, which was ranked second in the betweenness 427 centrality measurement (Additional file 2- Table S7).

428
When checking the absolute value of regression coefficients, three modules (ma-   Details of the three selected modules can be found in Table 2 and Additional file 2.

524
For the module "yellowgreen" and "violet" from the moulting network, we chose 525 the absolute hub for RNAi experiment. According to the criteria discussed in the 526 method section, we chose another one hub without paralogues from each module to 527 knock down.

528
No absolute hub was found in the module "steelblue" from the global net-   Table 3.  iments results from this study). One gene from the selected module "darkoliveg-594 reen" had been knocked down, but no phenotype was observed. No RNAi results 595 were found for the genes in the selected module "lavenderblush3" (Additional file 596 2- Table S13).

597
Notably, one hub (EMLSAG00000009839) from one non-preserved module (sky- shows that moulting-associated genes in the module "yellowgreen" and "steelblue" 626 obtained relatively high average ranks, and they were tightly connected with other hubs. In the module "violet", the moulting-associated gene obtained a low aver-628 age rank, but the genes annotated as TF obtained high average ranks and were 629 tightly connected with other hubs. For these modules, the proportion of differen-630 tially expressed genes was highest in the module "steelblue" identified from the 631 global network.

632
The enriched GO terms in the module "yellowgreen" included GO:0008152, ules. However, these modules passed none of our criteria for module selection, and negative results had been recorded. We thereby conclude that not all the intramod-721 ular hubs may be equally important, even if they have highest ranks, supporting the 722 need for an initial step of module selection. Combining our RNAi results and public 723 records, we argue that our rational approach is more likely to yield genes with mea-724 surable phenotypic effect under ablation of gene expression than random selection.

725
Nonetheless, more work is needed to affirm the relationship between centrality and 726 gene essentiality in this organism.

727
In addition to demonstrating the biological importance of intramodular hubs,

728
RNAi experiments also highlight the role of our selected genes in moulting and  Figure 1 Grouping of sample data and photographs of representative L. salmonis chalimus-1, chalimus-2 and preadult-1 larvae. Within each stage, lice were divided into groups of same instar age: directly after moulting (young), in the middle of the stage (middle) and directly before the moult to the next stage (old/moulting). Moults are represented by a green arrow and a shedded exoskeleton. In this study, data from lice of the middle and old/moulting instar age were used.      Tables  Table 1 ID, annotation and relevant publications for the moulting-associated transcripts   TranscriptID Annotation from LiceBase Publication EMLSAT00000012651 Nuclear    Table 4 Observed phenotypes by RNAi for the selected knock-down candidates. All phenotypes were assessed in adult female lice after injection of dsRNA at the pre-adult 2 stage except for *: phenotypic change was observed at larval stages after dsRNA treatment of nauplii (see Figure 4) .