PT  - JOURNAL ARTICLE
AU  - Masabho P. Milali
AU  - Maggy T. Sikulu-Lord
AU  - Samson S. Kiware
AU  - Floyd Dowell
AU  - George F. Corliss
AU  - Richard J. Povinelli
TI  - Age Grading &lt;em&gt;An. Gambiae&lt;/em&gt; and &lt;em&gt;An. Arabiensis&lt;/em&gt; Using Near Infrared Spectra and Artificial Neural Networks
AID  - 10.1101/490326
DP  - 2018 Jan 01
TA  - bioRxiv
PG  - 490326
4099  - http://biorxiv.org/content/early/2018/12/07/490326.short
4100  - http://biorxiv.org/content/early/2018/12/07/490326.full
AB  - Background Near infrared spectroscopy (NIRS) is currently complementing techniques to age-grade mosquitoes. NIRS classifies lab-reared and semi-field raised mosquitoes into &amp;lt; or ≥ 7 days old with an average accuracy of 80%, achieved by training a regression model using partial least squares (PLS) and interpreted as a binary classifier.Methods and findings We explore whether using an artificial neural network (ANN) analysis instead of PLS regression improves the current accuracy of NIRS models for age-grading malaria transmitting mosquitoes. We also explore if directly training a binary classifier instead of training a regression model and interpreting it as a binary classifier improves the accuracy.A total of 786 and 870 NIR spectra collected from laboratory reared An. gambiae and An. arabiensis, respectively, were used and pre-processed according to previously published protocols. Based on ten-fold Monte Carlo cross-validation, an ANN regression model scored root mean squared error (RMSE) of 1.6 ± 0.2 for An. gambiae and 2.8 ± 0.2 for An. arabiensis; whereas the PLS regression model scored RMSE of 3.7 ± 0.2 for An. gambiae, and 4.5 ± 0.1 for An. arabiensis. When we interpreted regression models as binary classifiers, the accuracy of the ANN regression model was 93.7 ± 1.0 % for An. gambiae, and 90.2 ± 1.7 % for An. arabiensis; while PLS regression model scored the accuracy of 83.9 ± 2.3% for An. gambiae, and 80.3 ± 2.1% for An. arabiensis. We also find that a directly trained binary classifier yields higher age estimation accuracy than a regression model interpreted as a binary classifier. A directly trained ANN binary classifier scored an accuracy of 99.4 ± 1.0 for An. gambiae, and 99.0 ± 0.6% for An. arabiensis; while a directly trained PLS binary classifier scored 93.6 ± 1.2% for An. gambiae, and 88.7 ± 1.1% for An. arabiensis.Conclusion Training both regression and binary classification age models using ANNs yields models with higher estimation accuracies than when the same age models are trained using PLS. Regardless of the model architecture, directly trained binary classifiers score higher accuracy on classifying age of mosquitoes than a regression model translated as binary classifier. Therefore, we recommend training models to estimate age of An. gambiae and An. arabiensis using ANN model architectures and direct training of binary classifier instead of training a regression model and interpret it as a binary classifier.