PT - JOURNAL ARTICLE AU - A. Bannach-Brown AU - P. PrzybyƂa AU - J. Thomas AU - A.S.C. Rice AU - S. Ananiadou AU - J. Liao AU - M. R. Macleod TI - The use of text-mining and machine learning algorithms in systematic reviews: reducing workload in preclinical biomedical sciences and reducing human screening error AID - 10.1101/255760 DP - 2018 Jan 01 TA - bioRxiv PG - 255760 4099 - http://biorxiv.org/content/early/2018/01/31/255760.short 4100 - http://biorxiv.org/content/early/2018/01/31/255760.full AB - Background In this paper we outline a method of applying machine learning (ML) algorithms to aid citation screening in an on-going broad and shallow systematic review, with the aim of achieving a high performing algorithm comparable to human screening.Methods We tested a range of machine learning algorithms. We applied ML algorithms to incremental numbers of training records and recorded the performance on sensitivity and specificity on an unseen validation set of papers. The performance of these algorithms was assessed on measures of recall, specificity, and accuracy. The classification results of the best performing algorithm was taken forward and applied to the remaining unseen records in the dataset and will be taken forward to the next stage of systematic review. ML was used to identify potential human errors during screening by analysing the training and validation datasets against the machine-ranked score.Results We found that ML algorithms perform at a desirable level. Classifiers reached 98.7% sensitivity based on learning from a training set of 5749 records, with an inclusion prevalence of 13.2% (see below). The highest level of specificity reached was 86%. Human errors in the training and validation set were successfully identified using ML scores to highlight discrepancies. Training the ML algorithm on the corrected dataset improved the specificity of the algorithm without compromising sensitivity. Error analysis sees a 3% increase or change in sensitivity and specificity, which increases precision and accuracy of the ML algorithm.Conclusions The technique of using ML to identify human error needs to be investigated in more depth, however this pilot shows a promising approach to integrating human decisions and automation in systematic review methodology.