TY - JOUR T1 - Using transfer learning and dimensionality reduction techniques to improve generalisability of machine-learning predictions of mosquito ages from mid-infrared spectra JF - bioRxiv DO - 10.1101/2022.07.26.501594 SP - 2022.07.26.501594 AU - Emmanuel P. Mwanga AU - Doreen J. Siria AU - Joshua Mitton AU - Issa H. Mshani AU - Mario Gonzalez Jimenez AU - Prashanth Selvaraj AU - Klaas Wynne AU - Francesco Baldini AU - Fredros O. Okumu AU - Simon A. Babayan Y1 - 2022/01/01 UR - http://biorxiv.org/content/early/2022/07/28/2022.07.26.501594.abstract N2 - Accurate prediction of mosquito population age structures can improve the evaluation of mosquito-targeted interventions since old mosquitoes are more likely to transmit malaria than young ones. Mid-infrared spectroscopy (MIRS) reveals age-associated variation in the biochemical composition of the mosquito cuticle, which can then be used to train machine learning (ML) models to predict mosquito ages. However, these MIRS-ML models are not always generalisable across different mosquito populations. Here, we investigated whether dimensionality reduction applied to the MIRS input data and transfer learning could improve the generalisability of MIRS-ML predictions for mosquito ages. We reared adults of the malaria vector, Anopheles arabiensis, in two insectaries (Ifakara, Tanzania and Glasgow, UK). The heads and thoraces of female mosquitoes of two age classes (1-9 day-olds and 10-17 day-olds) were scanned using an attenuated total reflection-Fourier transform infrared (ATR-FTIR) spectrometer (4000 cm-1 to 400 cm-1). The dimensionality of the spectra data was reduced using unsupervised principal component analysis (PCA) or t-distributed stochastic neighbour embedding (t-SNE), and then the spectra were used to train deep learning (DL) and standard machine learning (ML) classifiers. Transfer learning was also evaluated for improving the computational cost of the models when predicting mosquito age classes from new populations. Model accuracies for predicting the age of test mosquitoes from the same insectary as the training samples reached 99% for DL and 92% for ML, but did not generalise to a different insectary, achieving only 46% and 48% for ML for DL, respectively. Dimensionality reduction did not improve the model generalisability between locations but reduced computational time up to 5-fold. However, transfer learning by updating pre-trained models with 2% of mosquitoes from the alternate location brought both DL and standard ML model performance to ~98% accuracy for predicting mosquito age classes in the alternative insectary. Combining dimensionality reduction and transfer learning can reduce computational costs and improve the transferability of both deep learning and standard machine learning models for predicting the age of mosquitoes. Future studies could investigate the optimal quantities and diversity of training data necessary for transfer learning, and implications for broader generalisability to unseen datasets.Competing Interest StatementThe authors have declared no competing interest.CNNConvolutional neural networkITNsInsecticide treated netsPCAPrincipal component analysist-SNEt-distributed stochastic neighbour embedding ER -