Abstract
Feather pecking (FP) is a damaging non-aggressive behavior in laying hens with a heritable component. Its occurrence has been linked to the immune system, the circadian clock, and foraging behavior. Furthermore, dysregulation of miRNA biogenesis, disturbance of the gamma-aminobutyric acid (GABAergic) system, as well as neurodevelopmental deficiencies are currently under debate as factors influencing the propensity for FP behavior. Past studies, which focused on the dissection of the genetic factors involved in FP, relied on single nucleotide polymorphisms (SNPs) and short insertions and deletions < 50 bp (InDels). These variant classes only represent a certain fraction of the genetic variation of an organism. Hence, we re-analyzed whole genome sequencing data from two experimental populations, which have been divergently selected for FP behavior for over more than 15 generations, and performed variant calling for structural variants (SVs) as well as tandem repeats (TRs) and jointly analyzed the data with SNPs and InDels. Genotype imputation and subsequent genome-wide association studies in combination with expression quantitative trait loci analysis led to the discovery of multiple variants influencing the GABAergic system. These include a significantly associated TR downstream of the GABA receptor subunit beta-3 (GABRB3) gene, two micro RNAs targeting several GABA receptor genes, and dystrophin (DMD), a direct regulator of GABA receptor clustering. Furthermore, we found the transcription factor ETV1 to be associated with differential expression of 23 genes, which points towards a role of ETV1, together with SMAD4 and KLF14, in the disturbed neurodevelopment of high feather pecking chickens.
Introduction
Feather pecking (FP) in chickens is a behavioral disorder, which severely impacts animal welfare and causes significant economic losses. It has been proposed that FP is obsessive compulsive-like behavior. [3]. In the past, the damage has been controlled by beak trimming, which has now been prohibited in many European countries. Numerous studies found an involvement of environmental factors such as light intensity, nutrition, stocking density, and lack of foraging (reviewed by [4] and [5]). Furthermore, evidence has been accumulating that the immune system plays a major role in the development of FP behavior [1, 6–8]. The gut microbiota have also been proposed to be involved in FP behavior but this has been disproven in several studies [9, 10]. Since feather pecking is a complex heritable trait (reviewed by [11]) the dissection of the genetic causes is essential for the development of effective breeding strategies to eradicate the causative alleles. To achieve that goal chickens were divergently selected for FP behavior over more than 15 generations based on their estimated breeding values for the behavior. Breeding of these lines was initiated in Denmark and continued in Hohenheim, Germany. In Hohenheim two populations were established – an F2 cross and 12 half-sib families (in the preceding text referred to as F2 and HS). A detailed description of the experimental populations and the research conducted with them was reviewed by Bennewitz and Tetens [12]. Based on the results that were acquired by whole-genome and transcriptome sequencing with the Hohenheim selection lines, FP appears to be a disorder of the γ-aminobutyric acid (GABA) system in conjunction with a disturbance of embryonic neurodevelopment by a lack of leukocytes in the developing brain. Several variants in or in close proximity to GABA receptor genes were identified in genome-wide association studies (GWAS) conducted on medium density single nucleotide polymorphism (SNP) array and imputed sequence level genotypes [2, 13, 14]. Furthermore, brain transcriptome analysis of high and low feather peckers (HFP and LFP) before and after light stimulation revealed that HFP respond with very few changes in gene expression in comparison to LFP with numerous GABA receptor genes upregulated in LFP. Only GABRB2 (Gamma-Aminobutyric Acid Type A Receptor Subunit Beta2) was among differentially expressed genes (DEGs) in HFP but downinstead of upregulated – a pattern that we observed for most DEGs in HFP brains [14]. We attribute this low level of gene expression changes in response to light to a high level of excitation in HFP brains due to the lack of multiple GABA receptors. Since GABA is the major inhibitory neurotransmitter a lack of its receptors in the brain would lead to a high neuronal excitatory state, which explains the observed hyperactivity and the obsessive compulsive-like behavior observed in HFP. Since Dicer1 was among the downregulated DEGs we assume that miRNA processing is disturbed, as it is also the case in schizophrenia patients [15], which in turn leads to low GABA receptor expression levels. In the general comparison of brain transcriptomes between HFP and LFP hens, we observed an enrichment of immune system-related DEGs [1], which we could further pinpoint in an expression quantitative trait loci (eQTL) analysis to a small deletion 652 bp downstream of the KLF14 gene. In total, the differential expression of 40 genes between HFP and LFP chickens was significantly associated with this KLF14 variant, a majority of which are involved in leukocyte biology [8]. It has been shown in mice that CD4 T cells are essential for healthy development from the fetal to the adult brain. A defect in CD4 T cell maturation affected on synapse development and lead to behavioral abnormalities [16]. The evidence suggests that this mechanism is responsible for disturbed embryonic brain development in HFP chickens, which contributes to feather pecking behavior.
One commonality of all the studies that we conducted on the genetics involved in FP is the overlap in associated genes with human psychiatric disorders, most prevalent schizophrenia. Structural genomic variants (SVs) play a notable role in human psychiatric disorders [17], which is also the case for tandem repeats (TRs) [18]. Commonly investigated classes of SVs include insertions, deletions, duplications, inversions, and translocations, which arise from various combinations of DNA gains, losses, or rearrangements [19]. Here we present the first in-depth study on the potential role of SVs and TRs in FP, which led to a deeper understanding of the mechanisms responsible for this behavioral disorder.
Material and methods
Animals and husbandry
All experimental procedures [1], rearing and husbandry conditions [20] as well as phenotyping [2] were described in previous studies. Briefly, the phenotypic data were generated by direct observations made by seven independent investigators. The phenotypes are expressed as the number of FP bouts actively delivered during a standardized time span. As these count data heavily distributed from normality, Box-Cox-transformation was applied [2, 13].
Structural variants and short tandem repeats discovery
Illumina whole genome sequencing data were mapped to chicken genome version GRCg6a (GCF_000002315.5 RefSeq assembly) and used to call SNPs and short (< 50bp) insertions and deletions (InDels) in our previous study [2]. SVs were called as described by Blaj et al. [21] with slightly different settings: A high confidence SV call set was produced from the output of three variant callers: smove v0.2.6 (Brent, P. (2018). Smoove. https://brentp.github.io/post/smoove/), DELLY v0.7.7 [22], and manta v1.6.0 [23]. SURVIROR v1.0.7 [24] was used to combine the output of the three variant callers with the following settings: maximum distance between breakpoints of 1000 bp, minimum number of supporting callers: 2, SV type and strands were taken into account, and the minimum SV size was set to 30 bp. Variants with a call-rate < 0.8 and variants with QUAL < 1000 were removed. Tandem repeats were called with GangSTR [25]. A library of known TRs as input for GangSTR was acquired from the UCSC data repository (https://hgdownload.soe.ucsc.edu/goldenPath/galGal6/bigZips/). TRs were filtered to retain genotypes with a minimum sequence depth (DP) of 10, a quality score (Q) higher than 0.8, and a call-rate of 0.8.
Haplotype construction and imputation
To impute medium density chip genotypes to whole genome level SNPs and InDels we employed the same strategy as we previously described [2] with the deviation that all imputation steps were performed with Beagle 5.2 [26] and the setting ne=1000. SVs and TRs were merged with SNPs and InDels from our previous study, which were acquired with the genome analysis toolkit (GATK) [27]. This merged call set was phased with Beagle 5.2 for the estimation of haplotypes and used as a reference panel to impute medium density chip genotypes to SVs and TRs with the same strategy as for SNPs and InDels. To remove SNPs and InDels from the imputed SVs/TRs GATK v4.1.8.1 SelectVariants was utilized.
Detection of quantitative trait loci
Prior to GWAS multiallelic variants from the SVs/TR dataset were converted to biallelic variants with the norm function from bcftools (v. 1.14) [28]. All GWAS were conducted with gcta v. 1.92.3 beta3 [29] applying a mixed linear model association analysis with a leaving-one-chromosome-out approach. The phenotype used for GWAS, “Feather Pecks Delivered Box-Cox Transformed” (FPD_BC), has been described by Iffland et al. [13]. Phenotypes for expression GWAS (eGWAS) were normalized gene expression data from our previous study [8]. Information on the hatch was used as a covariate in all GWAS and eGWAS. Genomic relationship matrices were created from the target 60k SNP Chip genotypes. Meta-analyses of GWAS results were performed with METAL [30] as previously described [2]. The proportion of variance in phenotype explained by a given SNP (PVE) was calculated according to Shim et al. [31].
Association weight matrix construction
The AWM was created as described in our previous study [8] by deploying the strategy for AWM construction by Reverter and Fortes [32] followed by the detection of significant gene-gene interactions with their PCIT algorithm [33]. Input variants were chosen as follows: P value < 1 × 10−4 for the main phenotype (FPD_BC) or P value < 1 × 10−4 in least ten of the eGWAS. That way, variants affecting the main phenotype and gene expression were, were considered in the analysis. This led to the selection of 57 input variants, 0.16% of all detected SVs and TRs for the HS population. The R script by Reverter and Fortes was modified by setting the P value threshold for primary and secondary SNP selection to 1 × 10−4. The gene-gene interaction map was constructed with Cytoscape and gene class information was acquired with PANTHER [34] and UniProt [35].
Transcription factor enrichment
To detect significant binding site enrichment for the transcription factor ETV1 the CiiiDER software (build May 15th, 2020) [36] was employed with frequency matrix MA0761.2 (https://jaspar.genereg.net/matrix/MA0761.2/). Additional members of the ETS (E twenty-six) family of transcription factors used in the analysis were ETV2 (MA0762.1), ETV3 (MA0763.1), ETV4 (MA0764.2), ETV5 (MA0765.1), ETV6 (MA0645.1), and ETV7 (MA1708.1). Genes that were associated with the ETV1 variant (P value > 1 × 10−4; Supplementary Information S1) were used as input and DEGs with a Log2 fold change < 0.2 were used as background genes. The analysis was performed as in our previous study [8].
Results
Genome-wide association studies with different classes of genetic variants
In total 63,824 TRs and 11,098 SVs were discovered in the joint variant calling of whole genome sequenced HS and F0 chickens. The SVs contained 6,014 deletions, 2,741 inversions, 1,334 duplications, and 995 translocations. The variant calling of SNPs and InDels was reported in our previous study and yielded 12,864,421 SNPs and 2,142,539 InDels [2]. To grasp the whole depth of the datasets at hand we repeated the imputation of SNPs and InDels with the most recent Beagle version (v. 5.2) [26] and set the effective population size to 1000, which is known to improve the imputation accuracy in small populations [37]. GWAS results for the trait FPD_BC were analyzed for both experimental designs, the F2 cross and the HS population, separately and after combining the results in a metaanalysis. Manhattan plots from GWAS with imputed SNPs/InDels and imputed SVs/TRs are shown in Figure 1a. Common peaks for both variants classes on GGA1 and GGA2 were observable in the F2 cross, as well as on GGA1 and GGA3 of the HS population. QTL are summarized in Table 1 and lead SVs and TRs with their predicted effects are listed in Table 2.
The only variant that reached genome-wide significance after Bonferroni correction (-Log10P > 1.41×10−6) was a TR in the HS population 126,821 bp downstream of GABRB3. The proportion of variance in the phenotype FPD_BC, which is explained by this variant is 0.047. Given heritability estimates of around 0.15 [38–40] the amount of phenotypic variance explained by the lead variant is considerable. To visualize overlaps between the six different sets of results we created a heat map, which shows the –Log10P values of the top 20 associated genes from the different GWAS (Figure 1b). Numerous genes showed high association signals in both variant classes in the same experimental population. For instance, ENSGALG00000047084, GABRB3, INPP4A, ATP10A, TSGA10, LIPT1, COA5, UNC50, SHOX, CRLS2, and AFF3 showed high association in the HS population for both variant classes. The same holds true for the F2 cross (upper block of 20 genes in Figure 1b), although associations were not as strong as in the HS population. Of considerable interest are those genes, which show association in all GWAS: ENSGALG00000048825, ENSGALG00000051557, ENSGALG00000051256, ENSGALG00000016495, FHOD3, gga-mir-187, GALNT1, and PIEZO2. Predicted targets of the micro RNA (miRNA) gga-mir-187 are GABRA1, GABRB2, GABRB3, GABRG1, and GABRG2. GABRB2 is also a predicted target of gga-mir-3524b (www.targetscan.org accessed January 27, 2022).
Expression quantitative trait loci
To clarify whether the SVs and TRs that we detected influence the expression of transcripts that we identified in a previous study to be differentially expressed between LFP and HFP [1], we performed an eQTL analysis. We employed the same strategy as in our past study [8] and performed eGWAS for 86 genes from 167 chickens (84 HFP and 83 LFP) from the HS population (Manhattan plots are summarized in Supplementary Information S2). A total of 35,571 SVs and TRs were screened for association with gene expression and we detected 909 genome-wide significant associated signals. SVs and TRs for which we detected genome-wide association with at least 10 DEGs are shown in Table 3. To identify significant gene-gene interactions from those 86 eGWAS we constructed an association weight matrix [32] followed by the detection of significant correlations with the PCIT algorithm [33]. Central to the gene-gene interaction map is the transcription factor ETV1. We selected DEGs with a P value < 1×10−4 for association with the variant, a tandem repeat with the sequence CCCGGCCCG 70 bp upstream of ETV1 (GGA2: 27337541). With the 23 selected genes (Supplementary Information S1) we performed a transcription factor binding site enrichment analysis with Ciiider and found a significant enrichment (P value = 0.013, Log2-enrichment = 0.494) of ETV1 binding sites (Figure 3a) in proximity to those genes (Figure 3b, Supplementary Information S3). To demonstrate specificity we included all available PWMs for members of the ETS family of transcription factors: ETV2 – ETV6. The only other member of the TF family for which transcription factor binding site enrichment was detected was ETV4, but the P value was not significant. The complete results of the analysis are summarized in Supplementary Information S4.
Discussion
Numerous studies have been conducted to unravel the genetics behind FP behavior in chickens, as reviewed in [12, 41], but none of these focused on the analysis of structural genetic variation. We combined data of two well-described experimental crosses, an F2 design and a HS population, and performed multiple GWAS and an eQTL analysis on different classes of genetic variants. The lead variant from these analyses is a TR 126 kb downstream of the GABRB3 gene, which explains a considerable amount of the observed phenotypic variance. Furthermore, miRNA gga-mir-187 is strongly associated in all conducted GWAS (Figure 1b), and among predicted targets of this regulatory RNA are several GABA receptor genes. Based on whole-brain transcriptome analyses of the light response of HFP we previously postulated that downregulation or missing upregulation of GABA receptor expression is caused by miRNA dysregulation due to low levels of Dicer1 expression in HFP brains [14]. The findings presented here provide further evidence for a disturbance in miRNA regulation and involvement of the GABAergic system in FP behavior. Since these chickens have been selected for feather pecking behavior based on their estimated breeding values for multiple generations, it is possible that mutations leading to low expression of several GABA receptors have been accumulating. Previous studies with the same experimental population already pointed into that direction [2, 13]. But these are not only limited to GABA receptor genes, miRNAs or genes encoding miRNA processing proteins. By conducting an eQTL analysis with SVs and TRs we were able to confirm an involvement of DMD (dystrophin) in the regulation of genes that are differentially expressed between HFP and LFP. We already discovered DMD in an eQTL analysis with SNPs and InDels [8]. Since dystrophin is a direct regulator of GABA receptor clustering [42], this adds another layer to the GABA receptor disturbance in HFP brains. Several other genes that appear in the gene-gene interaction map based on significant AWM correlations (Figure 2) have been connected to the GABAergic system. These include COQ4 [43], ETV1 [44], NEURL1 [45], SLC25A26, and SLC27A4 [46]. In this regard, ETV1 is of considerable interest since it is the only transcription factor that we identified in our eQTL analysis with SVs and TRs. According to the Human Protein Atlas ETV1 expression is specific to salivary glands and the brain, specifically the cerebellum and thalamus (https://www.proteinatlas.org/ENSG00000006468-ETV1; accessed August 2022) ETV1 is associated with the differential expression of 23 genes between HFP and LFP and ETV1 binding sites are enriched in proximity to these genes (Figure 3). Among these putative ETV1 targets are CSF2RB, which has been associated with major depression and schizophrenia [47], and MIS18BP1, a gene implicated in autism spectrum [48]. Furthermore, ETV1 antibody coimmunoprecipitated the GABAA receptor α6 (GABAARα6) promoter region in mice [44] and is a regulator of gene expression in CD4 and CD8 T cells [49]. Hence, ETV1 may not only impact GABA receptor expression but might also participate in neurodevelopment. As demonstrated by Pasciuto et al. a lack of CD4 T cells in the brain of mice leads to excess immature neuronal synapses and behavioral abnormalities. Among the top DEGs identified by single-cell RNAseq were the transcription factors Klf2 and Klf4 [16]. We previously postulated that FP behavior is the result of disturbances in embryonic neurodevelopment and identified KLF14 as a major regulator in this regard [8]. In that study, we also identified a Smad4 domain-containing transcription factor, namely ENSGALG00000042129 (CHTOP), but we were not able to detect enrichment of binding sites for that transcription factor in proximity to its associated DEGs. However, in the light of the fact that ETV1 forms functional complexes with Smad4 [50] we propose a model in which the transcription factors ETV1, CHTOP, and KLF14 have an additive effect on the density of T cells in the developing brain of HFP. This might lead to structural abnormalities in the brains of HFP and may explain, at least in part, their abnormal behavior. Multiple studies point towards a central role of Krüppel-Like factors in neurodevelopment and behavior [51–55]. However, conclusive results on the function of Krüpple-like factors in the neurodevelopment in chickens have not been reported yet and research is hampered by erroneous annotation of KLF orthologues [56].
Apart from DMD we also discovered the genes PPP1R9A, INPP4A, and COA5 in both eQTL analyses. One of which was performed with SVs and TRs (Figure 2) and one with SNPs and InDels [8]. In schizophrenia and bipolar disorder, the prefrontal cortical expression of PPP1R9A was altered [57] and its gene product Neurabin regulates anxiety-like behavior in adult mice [58]. INPP4A has been implicated in multiple neurological conditions, namely schizophrenia, autism, epilepsy, and intellectual disability [59–61]. An additional link to schizophrenia is a frameshift variant in the GPATCH8 gene in exon 9 of transcript GPATCH8-201 (Table 3), an ortholog of ZNF804A, which has been shown to impact various mental illnesses via pre-mRNA processing [62]. We discovered multiple links of FP to human psychiatric disorders in our previous studies [1, 2, 14]. The fact that methylation of KLF14 correlates with psychosis severity in schizophrenia patients [63] strengthens our argument for the applicability of HFP chickens in the research of the basic mechanisms involved in human psychiatric disorders. Furthermore, by conducting multiple GWAS on two experimental populations of FP chickens with different variant classes with whole genome marker density we identified numerous genes that have previously been linked to psychiatric disorders or other neurological conditions or phenotypes (Table 4). These genes coherently fit the mechanistic hypotheses raised so far and at the same time underpin the complex nature of this trait. Among these are anxiety, depression, intellectual disability, autism, compulsive behavior, drug addiction, bipolar disorder, and schizophrenia. This raises the question, which common mechanisms these conditions share, which is a current matter of debate [64]. Blokhin et al. proposed that future treatment of human psychiatric disorders should be tailored under the use of a genome-wide “targetome”. The strong genetic burden of HFP chickens with mutations affecting neuropsychiatric genes makes this line of chickens a valid model system to study the effects of tailored drug treatment strategies.
Statements and Declarations
Availability of data and materials
All methods applied here have been outlined in previous studies [1, 2]. The raw RNA sequencing data has been deposited at the NCBI Sequence Read Archive (BioProject ID PRJNA656654) and the raw whole genome sequencing data as well (BioProject ID PRJNA664592).
Competing interests
The authors declare to have no competing interests of any kind.
Ethics approval
The research protocol was approved by the German Ethical Commission of Animal Welfare of the Provincial Government of Baden-Wuerttemberg, Germany (code: HOH 35/15 PG, date of approval: April 25, 2017).
Consent to participate
Not applicable
Consent for publication
Not applicable
Funding
The study was funded by the German Research Foundation (DFG) under file numbers TE622/4-2 and BE3703/8-2. The funders had no role in study design, data collection and analysis, and interpretation of data and in writing the manuscript. Publication fee was covered by the Open Access Publication Funds of the Göttingen University
Contributions
Conceptualization: Clemens Falker-Gieske, Jens Tetens; Methodology: Clemens Falker-Gieske, Jens Tetens; Formal analysis and investigation: Clemens Falker-Gieske; Writing - original draft preparation: Clemens Falker-Gieske; Writing - review and editing: Jens Tetens, Jörn Bennewitz; Funding acquisition: Jens Tetens, Jörn Bennewitz; Resources: Jens Tetens, Jörn Bennewitz; Supervision: Jens Tetens.
Supplementary Information
Supplementary Information S1: Differentially expressed genes between high and low feather pecking chickens, which were associated with a tandem repeat 70 bp downstream of the ETV1 transcription factor.
Supplementary Information S2: Manhattan plots of expression genome wide associations studies conducted on genes, which were differentially expressed between high and low feather pecking chickens.
Supplementary Information S3: Graphical representation of ETV1 binding sites in proximity to associated differentially expressed genes.
Supplementary Information S4: Results of transcription factor binding site enrichment analysis with Ciiider for transcription factors of the ETS (E twenty-six) family to differentially expressed genes that were associated with a tandem repeat downstream of ETV1.
Acknowledgements
We acknowledge support by the Open Access Publication Funds of the Göttingen University.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.
- 53.
- 54.
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.
- 66.
- 67.
- 68.
- 69.
- 70.
- 71.
- 72.
- 73.
- 74.
- 75.
- 76.
- 77.
- 78.
- 79.
- 80.
- 81.
- 82.
- 83.
- 84.
- 85.
- 86.
- 87.
- 88.
- 89.