Abstract
Despite its abundance in the fossil record, grass pollen is largely overlooked as a source of ecological and evolutionary data because most Poaceae species cannot be differentiated using traditional optical microscopy. However, deep learning techniques can quantify the small variations in grass pollen morphology visible under superresolution microscopy. We use the abstracted morphological features output by deep learning to estimate the taxonomic diversity and physiology of fossil grass pollen assemblages. Using a semi-supervised learning strategy, we trained convolutional neural networks (CNNs) on pollen images of 60 widely distributed grass species and unlabeled fossil Poaceae. Semi-supervised learning improved the CNN models’ capability to generalize feature recognition in fossil pollen specimens. Our models successfully captured both the taxonomic diversity of an assemblage and morphological differences between C3 and C4 species. We applied our trained models to fossil grass pollen assemblages from a 25,000-year lake-sediment record from eastern equatorial Africa and correlated past shifts in grass diversity with atmospheric CO2 concentration and proxy records of local temperature, precipitation, and fire occurrence. We quantified grass diversity for each time window using morphological variability, calculating both Shannon entropy and morphotype counts from the specimens’ CNN features. Reconstructed C3:C4 ratios suggest a gradual increase in C4 grasses with rising temperature and fire activity across the late-glacial to Holocene transition. Our results demonstrate that quantitative machine-learned features of pollen morphology can significantly advance palynological analysis, enabling robust estimation of grass diversity and C3:C4 ratio in ancient grassland ecosystems.
Significance The pollen of most grass species are morphologically indistinguishable using traditional optical microscopy, but we show that they can be differentiated through deep learning analyses of superresolution images. Abstracted morphological features derived from convolutional neural networks can be used to quantify the biological and physiological diversity of grass pollen assemblages, without a priori knowledge of the species present, and used to reconstruct past changes in the taxonomic diversity and relative abundance of C4 grasses in ancient grasslands. This approach unlocks ecological information that had been previously unattainable from the fossil pollen record and demonstrates that deep learning can solve some of the most intractable identification problems in the reconstruction of past vegetation dynamics.
Competing Interest Statement
The authors have declared no competing interest.