Abstract
Inference of spatial patterns of genetic structure often relies on parameter estimation and model evaluation using a set of summary statistics (SS) that summarise the information present in the data. An important subset of these SS is best described as diversity indices, which are based on information theory principles that can be classified as belonging to three different ‘families’ encompassing a spectrum of information measures, qH. These include the richness family of order q = 0, ArSS; the Shannon family of order q = 1, HSS; and the heterozygosity family of order q = 2, HeSS. Although commonly used by ecologists, the Shannon family has been rather neglected by population geneticists and evolutionary biologists. However, recent population genetic studies have advocated their use, yet the power of these SS for spatial structure discrimination has not been systematically assessed.
In this study, we performed a comprehensive assessment of the three families of SS, as well as a fourth family consisting of SS belonging to the Shannon family but expressed in terms of Hill numbers , for spatial structure inference using simulated microsatellites data under typical spatial scenarios. To give an unbiased evaluation, we used three machine learning methods, Kernel Local Fisher discriminant analysis (KLFDA), random forest classification (RFC), and deep neural network (DL), to test the performance of different SS to discriminate between spatial scenarios, and then identified the most informative metrics for discriminatory power.
Results showed that the SS family of order q = 1 expressed in terms of Hill numbers, , outperformed the other two families (Ar SS, He SS) as well as the untransformed Shannon entropy (H SS) family. Jaccard dissimilarity (J) and its Mantel’s r showed the highest discriminatory power to discriminate all spatial scenarios, followed by Shannon differentiation ΔD and its Mantel’s r.
Information-based summary statistics, especially the diversity of order q = 1 and Shannon differentiation measures, can increase the power of spatial structure inference. In addition, different sets of SS provide complementary power for discriminating between spatial scenarios.
Competing Interest Statement
The authors have declared no competing interest.