RT Journal Article SR Electronic T1 Functional data analysis techniques to improve the generalizability of near-infrared spectral data for monitoring mosquito populations JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.04.28.058495 DO 10.1101/2020.04.28.058495 A1 Pedro M. Esperança A1 Dari F. Da A1 Ben Lambert A1 Roch K. Dabiré A1 Thomas S. Churcher YR 2020 UL http://biorxiv.org/content/early/2020/04/29/2020.04.28.058495.abstract AB Near infrared spectroscopy is increasingly being used as an economical method to monitor mosquito vector populations in support of disease control. Despite this rise in popularity, strong geographical variation in spectra has proven an issue for generalising predictions from one location to another. Here, we use a functional data analysis approach—which models spectra as smooth curves rather than as a discrete set of points—to develop a method that is robust to geographic heterogeneity. Specifically, we use a penalised generalised linear modelling framework which includes efficient functional representation of spectra, spectral smoothing and regularisation. To ensure better generalisation of model predictions from one training set to another, we use cross-validation procedures favouring smoother representation of spectra. To illustrate the performance of our approach, we collected spectra for field-caught specimens of Anopheles gambiae complex mosquitoes – the most epidemiologically important vector species on the planet – in two sites in Burkina Faso. Using these spectra, we show how models trained on data from one site can successfully classify morphologically identical sibling species in another site, over 250km away. Whilst we apply our framework to species prediction, our unified statistical framework can, alternatively, handle regression analysis (for example, to determine mosquito age) and other types of multinomial classification (for example, to determine infection status). To make our methods readily available for field entomologists, we have created an open-source R package mlevcm. All data used is publicly also available.Competing Interest StatementThe authors have declared no competing interest.