Abstract
Shotgun metagenomics is a powerful tool for studying the genomic traits of microbial community members, such as genome size, gene content, etc. While such traits can be used to better understand the ecology and evolution of microbial communities, the accuracy of their estimations can be critically influenced by both known and unknown factors. One factor that can bias trait estimations is the proportion of eukaryotic DNA in a metagenome, as some bioinformatic tools assume that all DNA reads in a metagenome are non-eukaryotic. Here, we help resolve a recent debate about the influence of eukaryotic DNA in the estimation of average genome size from a global soil sample dataset using a new bioinformatic tool. Contrary to what was assumed, our reanalysis of this dataset revealed that soil samples can contain a substantial proportion of eukaryotic DNA (∼38.8%), which severely inflated average genome size estimates. We report that correcting for this bias significantly improves the statistical support for the negative relationship between average bacterial genome size and soil pH. These results highlight that metagenomes can contain large quantities of eukaryotic DNA, and that new methods that correct for this can improve microbial trait estimation.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Broadened the abstract. Minor main text changes.
https://github.com/EisenRa/2024_soil_dark_matter_reply/tree/main