Abstract
Phylogenetic regression is a type of Generalized Least Squares (GLS) method that incorporates a covariance matrix based on the evolutionary relationships between species (i.e., phylogenetic relationships). While this method has found widespread use in hypothesis testing via comparative phylogenetic methods, such as phylogenetic ANOVA, its ability to account for non-linear relationships has received little attention.
To address this issue, we utilized GLS in a high-dimensional feature space, employing linear combinations of transformed data to account for non-linearity, a common approach in kernel regression. We analyzed two biological datasets using both Radial Basis Function (RBF) and linear kernel transformations. The first dataset contained morphometric data, while the second dataset comprised discrete trait data and diversification rates as labels. Hyperparameter tuning of the model was achieved through cross-validation rounds in the training set.
In the tested biological datasets, regularized kernels reduced the error rate (as measured by RMSE) by around 20% compared to linear-based regression when data did not exhibit linear relationships. In simulated datasets, the error rate decreased almost exponentially with the level of non-linearity.
These results show that introducing kernels into phylogenetic regression analysis presents a novel and promising tool for complementing phylogenetic comparative methods. We have integrated this method into Python package named phyloKRR, which is freely available at: https://github.com/Ulises-Rosas/phylokrr.
Competing Interest Statement
The authors have declared no competing interest.