Abstract
The filamentous fungal family Aspergillaceae contains > 1,000 known species, mostly in the genera Aspergillus and Penicillium. Fungi in Aspergillaceae display a wide range of lifestyles, including several that are of relevance to human affairs. For example, several species are used as industrial workhorses, food fermenters, or platforms for drug discovery (e.g., Aspergillus niger, Penicillium camemberti), while others are dangerous human and plant pathogens (e.g., Aspergillus fumigatus, Penicillium digitatum). Reconstructing the phylogeny and timeline of the family’s diversification is the first step toward understanding how its diverse range of lifestyles evolved. To infer a robust phylogeny for Aspergillaceae and pinpoint poorly resolved branches and their likely underlying contributors, we used 81 genomes spanning the diversity of Aspergillus and Penicillium to construct a 1,668-gene data matrix. Phylogenies of the nucleotide and amino acid versions of this full data matrix were generated using three different maximum likelihood schemes (i.e., gene-partitioned, unpartitioned, and coalescence). We also used the same three schemes to infer phylogenies from five additional 834-gene data matrices constructed by subsampling the top 50% of genes according to different criteria associated with strong phylogenetic signal (alignment length, average bootstrap value, taxon completeness, treeness / relative composition variability, and number of variable sites). Examination of the topological agreement among these 36 phylogenies and measures of internode certainty identified 12 / 78 (15.4%) bipartitions that were incongruent. Patterns of incongruence across these 12 bipartitions fell into three categories: (i) low levels of incongruence for 2 shallow bipartitions, most likely stemming from incomplete lineage sorting, (ii) high levels of incongruence for 3 shallow bipartitions, most likely stemming from hybridization or introgression (or very high levels of incomplete lineage sorting), and (iii) varying levels of incongruence for 7 deeper bipartitions, most likely stemming from reconstruction artifacts associated with poor taxon sampling. Relaxed molecular clock analyses suggest that Aspergillaceae likely originated in the lower Cretaceous, 125.1 (95% Confidence Interval (CI): 146.7 - 102.1) million years ago (mya), with the origins of the Aspergillus and Penicillium genera dating back to 84.3 mya (95% CI: 90.9 - 77.6) and 77.4 mya (95% CI: 94.0 - 61.0), respectively. Our results provide a robust evolutionary and temporal framework for comparative genomic analyses in Aspergillaceae, while our general approach provides a widely applicable template for phylogenomic identification of resolved and contentious branches in densely genome-sequenced lineages across the tree of life.
- Abbreviations
- NT
- nucleotide
- AA
- amino acid
- CI
- confidence interval
- RCV
- relative composition variability
- IC
- internode certainty
- GSF
- gene support frequencies
- GLS
- gene-wise log-likelihood scores
- DVMC
- degree of violation of a molecular clock