ABSTRACT
The social hymenoptera are emerging as models for epigenetics. In mammals and flowering plants’ epigenetics, methylation affects allele specific expression. There is contradictory evidence for the role of methylation on allele specific expression in social insects. The aim of this paper is to investigate allele specific expression and monoallelic methylation in the bumblebee, Bombus terrestris. We found nineteen genes that were both monoallelically methylated and monoallelically expressed in a single bee. A number of these genes are involved in reproduction. Fourteen of these genes express the hypermethylated allele, while the other five express the hypomethylated allele. We also searched for allele specific expression in twenty-nine published RNA-seq libraries. We found 555 loci with allele-specific expression. Genomic imprinting in mammals often involves monoallelic methylation and expression. It is tempting to associate our results with genomic imprinting, especially as a number of the genes discovered are exactly the type predicted by theory to be imprinted. Caution however should be applied due to the lack of understanding of the functional role of methylation in gene expression in insects and in the as yet unquantified role of genetic cis effects in insect allele specific methylation and expression.
INTRODUCTION
Epigenetics is the study of heritable changes in gene expression that do not involve changes to the underlying DNA sequence (Goldberg et al. 2007). Social hymenoptera (ants, bees, and wasps) are important emerging models for epigenetics (Glastad et al. 2011; Weiner and Toth 2012; Welch and Lister 2014; Yan et al. 2014). This is due to theoretical predictions for a role for an epigenetic phenomenon, genomic imprinting (parent of origin allele specific expression), in their social organisation (Queller 2003), the recent discovery of parent-of-origin allele specific expression in honeybees (Galbraith et al. 2016), and data showing a fundamental role in social insect biology for DNA methylation, an epigenetic marker (Chittka et al. 2012).
However the presence of allele specific expression does not necessarily mean an epigenetic process is involved. Allele specific expression is known to be caused by a number of genetic as well as epigenetic processes (Palacios et al. 2009). The genetic process usually involves cis effects such as transcription factor binding sites, or less often, untranslated regions which alter RNA stability or microRNA binding (Farh et al. 2005).
In mammals and flowering plants, allele specific expression is often associated with methylation marks passed from parents to offspring (Reik and Walter 2001). However DNA methylation is involved in numerous other cellular processes (Bird 2002). There is contradictory evidence for the role of methylation on allele specific expression in social insects. Methylation is associated with allele specific expression in a number of loci in the ants Camponotus floridanus and Harpegnathos saltator (Bonasio et al. 2012). Recently, we found evidence for allele specific expression in bumblebee worker reproduction genes (Amarasinghe et al. 2015) and that methylation is important in bumblebee worker reproduction (Amarasinghe et al. 2014). However, other work on the honeybee Apis mellifera found no link between genes showing allele specific expression and known methylation sites in that species (Kocher et al. 2015).
The aim of this paper is to investigate allele specific expression and methylation in the bumblebee, Bombus terrestris. The recently sequenced genome of the bumblebee, Bombus terrestris displays a full complement of genes involved in the methylation system (Sadd et al. 2015). An extreme form of allele specific expression involves monoallelic expression, where one allele is completely silenced. In the canonical mammal and flowering plant systems, this is often associated with monoallelic methylation. In this paper, we examined the link between monoallelic methylation and monoallelic expression in the bumblebee, Bombus terrestris, by examining two whole methylome libraries and an RNA-seq library from the same bee. MeDIP-seq is an immunoprecipitation technique that creates libraries enriched for methylated cytosines (Harris et al. 2010). Methyl-sensitive restriction enzymes can create libraries that are enriched for non-methylated cytosines (MRE-seq) (Harris et al. 2010). Genes found in both libraries are monoallelically methylated, with the hypermethylated allele being in the MeDIP-seq data and the hypomethylated allele in the MRE-seq data (Harris et al. 2010). Monoallelic expression was identified in these loci from the RNA-seq library. If only one allele was expressed then we knew that these loci were both monoallelically methylated and monoallelically expressed in this bee. We confirmed this monoallelic expression in one locus using qPCR.
We then more generally searched for allele specific expression by analysing twenty nine published RNA-seq libraries from worker bumblebees (Harrison et al. 2015; Riddell et al. 2014). We identified heterozygotes in the RNA-seq libraries and measured the expression of each allele. We then identified loci that showed significant expression differences between their two alleles.
MATERIALS AND METHODS
Samples
Data from twenty-nine RNA-seq libraries were used for the allele specific expression analysis (six from Harrison et al. (Harrison et al. 2015), and twenty-three from Riddell et al. (Riddell et al. 2014). The Riddell bees came from two colonies, one commercially reared bumblebee colony from Koppert Biological Systems U.K. and one colony from a wild caught queen from the botanic gardens, Leicester. The Harrison bees were from three commercially reared colonies obtained from Agralan Ltd. A Koppert colony worker bee was used for the MeDIP-seq / MRE-seq / RNA-seq experiment, and was from a separate Koppert colony to the bees used for the qPCR analysis. Samples are outlined in Table 1. Colonies were fed ad libitum with pollen (Percie du sert, France) and 50 % diluted glucose/fructose mix (Meliose – Roquette, France). Before and during the experiments colonies were kept at 26oC and 60% humidity in constant red light.
Next generation sequencing
MeDIP-seq, MRE-seq and RNA-seq RNA and DNA was extracted from a single five day old whole bee (Colony K2). DNA was extracted using an ethanol precipitation method. Total RNA was extracted using Tri-reagent (Sigma-Aldrich, UK).
Three libraries were prepared from this bee by Eurofins genomics. These were MeDIP-seq and MRE-seq libraries on the DNA sample and one amplified short insert cDNA library with size of 150-400 bp using RNA. Both the MeDIP-seq and MRE-seq library preparations are based on previously published protocols (Harris et al. 2010). MeDIP-seq uses monoclonal antibodies against 5-methylcytosine to enrich for methylated DNA independent of DNA sequence. MRE-seq enriches for unmethylated cytosines by using methylation-sensitive enzymes that cut only restriction sites with unmethylated CpGs. Each library was individually indexed. Sequencing was performed on an Illumina HiSeq®2000 instrument (Illumina, Inc.) by the manufacturer’s protocol. Multiplexed 100 base paired-read runs were carried out yielding 9390 Mbp for the MeDIP-seq library, 11597 Mbp for the MRE-seq library and 8638 Mbp for the RNA-seq library.
Previously published RNA-seq Full details of the RNA-seq protocols used have been published previously (Harrison et al. 2015; Riddell et al. 2014). Briefly, for the Riddell bees, total RNA was extracted from twenty three individual homogenised abdomens using Tri-reagent (Sigma-Aldrich, UK). TruSeq RNA-seq libraries were made from the 23 samples at NBAF Edinburgh. Multiplexed 50 base single-read runs was performed on an Illumina HiSeq®2000 instrument (Illumina, Inc.) by the manufacturer’s protocol. For the Harrison bees, total RNA was extracted from whole bodies using a GenElute Mammalian Total RNA Miniprep kit (Sigma-Aldrich) following the manufacturers’ protocol. The six libraries were sequenced as multiplexed 50 base single-read runs on an Illumina HiSeq 2500 system in rapid mode at the Edinburgh Genomics facility of the University of Edinburgh.
Monoallelic methylation and expression - Bioinformatic analysis
We searched for genes that were monoallelically methylated (present in both methylation libraries), heterozygous and monoallelically expressed (only one allele present in the RNA-seq library).
Alignment and bam refinement mRNA reads were aligned to the Bombus terrestris genome assembly (AELG00000000) using Tophat (Kim et al. 2013) and converted to bam files with Samtools (Li et al. 2009). Reads were labelled with the AddOrReplaceReadGroups.jar utility in Picard (http://picard.sourceforge.net/). The MRE-seq and MeDIP-seq reads were aligned to the genome using BWA mapper (Li and Durbin 2009). The resultant sam alignments were soft-clipped with the CleanSam.jar utility in Picard and converted to bam format with Samtools. The Picard utility AddOrReplaceReadGroups.jar was used to label the MRE and MeDIP reads which were then locally re-aligned with GATK (DePristo et al. 2011; McKenna et al. 2010). PCR duplicates for all bams (mRNA, MeDIP and MRE) were marked with the Picard utility Markduplicates.jar.
Identifying regions of interest and integrating data Coverage of each data type was calculated using GATK DepthofCoverage (McKenna et al. 2010). Only regions with a read depth of at least six in each of the libraries (RNA-seq, MeDIP-seq and MRE-seq) was used. Heterozygotes were identified using Samtools mpileup and bcftools on each data set separately (Li and Durbin 2009) and results were merged with vcf tools (Danecek et al. 2011). CpG islands were identified using CpG island searcher (Takai and Jones 2002). Regions of mRNA with overlaps of MeDIP, MRE, CpG islands and monoallelic snps were identified with custom perl scripts.
Allele specific expression - Bioinformatic analysis
We created a pipeline to search for heterozygous loci that show allele specific expression and identify the associated enriched gene ontology (GO) terms in twenty-nine previously published RNA-seq libraries (Harrison et al. 2015; Riddell et al. 2014).
Each RNA library was mapped to the Bombus terrestris reference genome (Bter 1.0, accession AELG00000000.1) (Sadd et al. 2015) using the BWA mapper (Li and Durbin 2009). The mean GC content of the 29 libraries was 42.34%, with individual libraries having a similar GC content ranging from 40-46%. GC content differed with run (Nested ANOVA: F = 20.302, df = 1, p < 0.001), but not by colony (Nested ANOVA: F = 1.763, df = 4, p = 0.171). The mean coverage of the 29 libraries was 13.29, with mean library coverage ranging from 9.84 to 17.61. Run had an effect on coverage (Nested ANOVA: F = 7.554, df = 1, p = 0.011), as did colony (Nested ANOVA: F = 6.962, df = 4, p < 0.001).
Therefore, the combat method in the R package SVA (version 3.20.0) was used to remove any batch effects and control for original differences in coverage (Leek et al. 2012; Johnson et al. 2007). The success of this control was confirmed by the R package edgeR (version 3.14.0) (McCarthy et al. 2012; Robinson et al. 2010). The SVA adjustment reduced the edgeR dispersion value from 3.9994 (BCV=2) to 0 (BCV=0.0003) (supplementary figure 1).
Bcftools (version 0.1.19-44428cd), bedtools (version 2.17.0), and samtools (version 0.1.19-44428cd) were used to prepare the RNA libraries and call the SNPs, before the SNPs were filtered based on mapping quality score (Quinlan and Hall 2010; Li and Durbin 2009). Only SNPs with a mapping quality score of p <0.05 and a read depth of ≥6 were included in the analyses. The R package, QuASAR, was then used to identify genotypes (according to the Hardy-Weinberg equilibrium) and locate any allele specific expression at heterozygous sites (Harvey et al. 2014). QuASAR removes snps with extreme differential allele expression from the analyses, thus controlling for any base-calling errors. The loci (the snp position +/− 2900bp) identified as showing ASE in at least three of the thirty libraries, were blasted (Blastx) against Drosophila melanogaster proteins (non-redundant (nr) database) (Altschul 1997). The blast results were annotated using Blast2Go (Gotz et al. 2008). Fisher’s exact test was implemented to identify enriched GO terms, which were then visualised using REVIGO (Supek et al. 2011). To identify which bumblebee genes the snps were located in, the snp position +/− 25 bp was blasted (Blastn) against the Bombus terrestris genome (Sadd et al. 2015).
Candidate gene allele specific qPCR
DNA was extracted from four bees from three Koppert colonies using the Qiagen DNA Micro kit according to manufacturer’s instructions. RNA was extracted from samples of the heads of the same worker bees with the QIAGEN RNeasy Mini Kit according to manufacturer’s instructions. cDNA was synthesized from a 8µl sample of RNA using the Tetro cDNA synthesis Kit (Bioline) as per manufacturer’s instructions.
We amplified numerous fragments of the 19 candidate genes. Sanger sequencing results were analyzed using the heterozygote analysis module in Geneious version 7.3.0 to identify heterozygotic nucleotide positions. It was difficult to identify snps in exonic regions of the 19 loci, which could be amplified with primers of suitable efficiency. We managed to identify a suitable region in slit homolog 2 protein-like (AELG01000623.1 exonic region 1838-2420).
The locus was run for 3 different reactions; T allele, G allele and reference. Reference primers were designed according to Gineikiene et al. (2009). A common reverse primer (CTGGTTCCCGTCCAATCTAA) was used for all three reactions. A reference forward primer (CGTGTCCAGAATCGACAATG) was designed to the same target heterozygote sequence, upstream of the heterozygote nucleotide position. The reference primers measure the total expression of the gene, whereas the allele specific primers (T allele: CCAGAATCGACAATGACTCGT, G allele: CAGAATCGACAATGACTCGG) measure the amount of expression due to the allele. Thus the ratio between the allele specific expression and reference locus expression would be the relative expression due to the allele.
Three replicate samples were run for each reaction. All reactions were prepared by the Corbett robotics machine, in 96 well qPCR plates (Thermo Scientific, UK). The qPCR reaction mix (20µl) was composed of 1µl of diluted cDNA (50ng/µl), 1µl of forward and reverse primer (5µM/µl each), 10µl 2X SYBR Green JumpStart Taq ReadyMix (Sigma Aldrich, UK) and 7µl ddH20. Samples were run in a PTC-200 MJ thermocycler. The qPCR profile was; 4 minutes at 95oC denaturation followed by 40 cycles of 30s at 95oC, 30s at 59oC and 30s at 72oC and a final extension of 5 minutes at 72oC.
Forward primers are different, both in their terminal base (to match the snp) and in their length. It is entirely possible that they may amplify more or less efficiently even if there was no difference in amount of template (Pfaffl 2001). To test for this we repeated all qPCRs with genomic DNA (1µl of diluted DNA (20ng/µl) from the same bees as the template. We would expect equal amounts of each allele in the genomic DNA. We also measured efficiency of each reaction as per Liu and Saint (2002).
Median Ct was calculated for each set of three technical replicates. A measure of relative expression (ratio) was calculated for each allele in each worker bee as follows:
E is the median efficiency of each primer set (Liu and Saint 2002; Pfaffl 2001). All statistical analysis was carried out using R (3.1.0) (Team 2015).
Data Availability
All sequence data for this study are archived at European Nucleotide Archive (ENA); Accession no. PRJEB9366 (http://www.ebi.ac.uk/ena/data/view/PRJEB9366), x, x. GO-analysis results and lists of differentially expressed transcripts are available as Supporting Information.
RESULTS
In total, we found nineteen genes that were both monoallelically methylated (present in both Me-DIP and MRE-seq libraries) and monoallelically expressed (only one allele present in the RNA-seq library), for an example see Bicaudal-D in Figure 1. Of the nineteen genes, fourteen had the hypermethylated (MeDIP) allele expressed, while five had the hypomethylated (MRE-seq) allele expressed (see supplementary table 1).
Monoallelic expression was confirmed in one of these nineteen (slit homolog 2 protein-like (AELG01000623.1)) by allele specific qPCR (Amarasinghe et al. 2015). The allele with a guanine at the snp position had a mean expression of 6.04 ±8.28 (standard deviation) in four bees from three different colonies. The thymine allele was not expressed at all in these bees. This was not due to the efficiency of the primers as the DNA controls of both alleles showed similar amplification (G mean = 422.70 ±507.36, T mean = 1575.17 ±503.02). In the three other loci tested (Ras GTPase-activating protein 1, Ecdysone receptor, methionine aminopeptidase 1-like) we found apparent monoallelic expression, but could not dismiss primer efficiency as the cause.
The nineteen genes were blasted against the nr/nt database (blastn). Four returned no hits and a further four returned noninformative hits. A number of these genes had homologs known to be methylated in other animals (Table 2). Six of the eleven genes with informative hits have functions to do with social organisation in the social insects (Table 2).
We then looked at these nineteen genes in twenty-nine previously published RNA-seq libraries. Fifteen of these nineteen genes expressed a single allele in all twenty nine RNA-seq libraries, see supplementary table 2. The remaining four genes (AELG01000620.1, AELG01001021.1, AELG01002224.1a, AELG01002224.1b) were inconsistent; they showed expression of one allele in some B. terrestris workers, and expression of two alleles in other workers.
We then searched more generally for allele specific expression in the twenty-nine RNA-seq libraries. 555 loci showed allele-specific expression in ≥3 of the 29 RNA-seq libraries (supplementary table 3). Blasting (Blastn) these loci against Bombus terrestris returned 211 hits. To search for gene ontology terms, we blasted (blastx) against Drosophila melanogaster, which returned 329 hits. One hundred and fifty-one Gene Ontology(GO) terms were enriched in the 555 regions showing allele specific expression (Fisher’s exact test p >0.05), however none were significant at the more stringent FDR >0.05. Figure 2 shows the large number of biological functions associated with these 555 genes.
DISCUSSION
Of the nineteen genes displaying monoallelic methylation and monoallelic expression, fourteen had the hypermethylated (MeDIP) allele expressed, while five had the hypomethylated (MRE-seq) allele expressed (see supplementary table 1). In ant genes with allele specific methylation, the hypermethylated allele showed more expression than the hypomethylated allele (Bonasio et al. 2012). This fits with genome wide analysis that shows exonic methylation in insects associated with increased gene expression (Glastad et al. 2014; Yan et al. 2015). Our fourteen genes with the hypermethylated allele expressed agree with this pattern. But how to explain the five genes where the hypomethylated allele was expressed? Firstly, the role of methylation in insect gene expression is not clear cut, with the relationship between exonic methylation and expression often disappearing at the gene level (Yan et al. 2015). For example, EGFR expression is lower in ant workers that exhibit higher DNA methylation of EGFR (Alvarado et al. 2015). Secondly, even in the canonical mammalian methylation system, the “wrong” allele has been shown to be expressed occasionally due to lineage specific effects (Dean et al. 1998; Pardo-Manuel de Villena et al. 2000; Onyango et al. 2002; Sapienza 2002; Zhang et al. 1993).
We then looked at the expression of these nineteen genes in all twenty-nine RNA-seq libraries. If they are monoallelically expressed in these bees, we would find only one allele in a given RNA-seq library. Fifteen of these nineteen genes were confirmed to show a single allele in all twenty-nine RNA-seq libraries. We would also find only one allele if that bee was homozygous. We can not rule out that these fifteen genes just happen to be homozygous in all twenty-nine bees from five different colonies from multiple sources.
The remaining four genes showed inconsistent expression with one allele being expressed in some B. terrestris workers, and expression of two alleles in other workers. Natural intraspecific variation in allele specific expression has been found in other species (Pignatta et al. 2014). Another explanation is that these loci are not epigentically controlled but rather their allele specific expression is derived from genetic effects (Remnant et al. 2016).
There are three main genetic, as opposed to epigenetic, affectors of allele specific expression (Edsgard et al. 2016). Allele specific expression can be caused by differences in the alleles’ sequence within the translated part resulting in a modified protein. A change at the alleles’ cis regulatory sites, could cause differential binding of transcription factors. Transcript processing can be affected by a change in the alleles’ sequence a splice site or untranslated region. This large number of possible causes of allele specific expression could explain why we see so many functions associated with the 555 genes showing allele specific expression (Table 2).
But it is not just allele specific expression that may have genetic as well as epigenetic effects. It has been shown in humans that some allele specific methylation is determined by DNA sequence in cis and therefore shows Mendelian inheritance patterns (Meaburn et al. 2010). An extreme example of genetically controlled allele specific methylation is found in Nasonia wasps, where there is no evidence for methylation driven allele specific expression but inheritable cis-mediated allele specific methylation has been found (Wang et al. 2016). This cis-mediated methylation has recently been suggested as being important in social insect biology (Remnant et al. 2016; Wedd et al. 2016).
We have found that allele specific expression is widespread in the bumblebee. We have also found that the extreme version of allele specific expression, monoalleic expression is associated with monoallelic methylation. Genomic imprinting in mammals usually involves monoallelic methylation and expression. It is tempting to associate our results with genomic imprinting, especially as a number of the genes discovered are exactly the type predicted by theory to be imprinted (Queller 2003). Caution however should be applied due to the lack of understanding of the functional role of methylation in gene expression in insects and in the as yet unquantified role of genetic cis effects in insect allele specific methylation and expression.
ETHICAL DECLARATION
The protocol reported here conforms to the regulatory requirements for animal experimentation in the United Kingdom.
COMPETING INTERESTS
The authors declare they have no competing interests.
AUTHOR CONTRIBUTIONS
ZNL analysed the data and wrote the initial draft. KDL analysed the data and was involved in the redrafting of the manuscript.
HEA carried out the experiments and was involved in the redrafting of the manuscript. DN carried out the experiments and was involved in the redrafting of the manuscript. MK analysed the data was involved in the redrafting of the manuscript. EBM designed the project, analysed the data and wrote the initial draft.
ACKNOWLEDGEMENTS
This work was financially supported by NERC grant no. NE/H010408/1 and NE/N010019/1 and NERC Biomolecular Analysis Facility research grants (NBAF 606 and 829) to EBM. Illumina library preparation, sequencing and bioinformatics were carried out by Edinburgh Genomics, The University of Edinburgh. Edinburgh Genomics is partly supported through core grants from NERC (R8/H10/56), MRC (MR/K001744/1) and BBSRC (BB/J004243/1). ZNL would like to thank UK BBSRC for its financial support via MIBTP. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Footnotes
↵1 Department of Genetics, University of Leicester, LE1 7RH, Leicester, U.K. ebm3{at}le.ac.uk