Abstract
Background Tardigrades are meiofaunal ecdysozoans that may be key to understanding the origins of Arthropoda. Many species of Tardigrada can survive extreme conditions through adoption of a cryptobiotic state. A recent high profile paper suggested that the genome of a model tardigrade, Hypsibius dujardini, has been shaped by unprecedented levels of horizontal gene transfer (HGT) encompassing 17% of protein coding genes, and speculated that this was likely formative in the evolution of stress resistance. We tested these findings using an independently sequenced and assembled genome of H. dujardini, derived from the same original culture isolate.
Results Whole-organism sampling of meiofaunal species will perforce include gut and surface microbiotal contamination, and our raw data contained bacterial and algal sequences. Careful filtering generated a cleaned H. dujardini genome assembly, validated and annotated with GSSs, ESTs and RNA-Seq data, with superior assembly metrics compared to the published, HGT-rich assembly. A small amount of additional microbial contamination likely remains in our 135 Mb assembly. Our assembly length fits well with multiple empirical measurements of H. dujardini genome size, and is 120 Mb shorter than the HGT-rich version. Among 23,021 protein coding gene predictions we found 216 genes (0.9%) with similarity to prokaryotes, 196 of which were expressed, suggestive of HGT. We also identified ~400 genes (<2%) that could be HGT from other non-metazoan eukaryotes. Cross-comparison of the assemblies, using raw read and RNA-Seq data, confirmed that the overwhelming majority of the putative HGT candidates in the previous genome were predicted from scaffolds at very low coverage and were not transcribed. Crucially much of the natural contamination in both projects was non-overlapping, confirming it as foreign to the shared target animal genome.
Conclusions We find no support for massive horizontal gene transfer into the genome of H. dujardini. Many of the bacterial sequences in the previously published genome were not present in our raw reads. In construction of our assembly we removed most, but still not all, contamination with approaches derived from metagenomics, which we show are very appropriate for meiofaunal species. We conclude that HGT into H. dujardini accounts for 1–2% of genes and that the proposal that 17% of tardigrade genes originate from HGT events is an artefact of undetected contamination.