Abstract
Background In public health research, there is currently a need to close the gap between care delivery and cohort identification. We need dedicated tagging staff to allocate a considerable amount of effort to assigning clinical codes after reading patient summaries. Machine learning automation can facilitate the classification of these clinical narratives, but sufficient availability of electronic medical records is still a bottleneck. Veterinary medical records represent a largely untapped data source that could be used to benefit both human and non-human patients. Very few approaches utilizing veterinary data sources currently exist.
Methods In this retrospective cross-sectional and chart review study, we trained separate long short-term memory (LSTM) Recurrent Neural Networks (RNNs) on 52,722 human records and 89,591 veterinary records, tested the models’ efficacy in a standard train-test split setup, and probed the portability of these models across species domains. We trained versions of our models using first the free-text clinical narratives, and then only using extracted clinically relevant terms from MetaMap Lite, a natural language processing tool intended for this purpose.
Findings We show that our LSTM approach correctly classifies across top-level codes in the veterinary records (F1 score =0·83), and identifies top-level neoplasia records in veterinary records (F1 score = 0·93). The model trained with veterinary data can be ported over to identify neoplasia records in the human records (F1 score = 0·70).
Interpretation Our findings suggest that free-text clinical narratives can be used to learn classification models that allow the rapid identification of patient cohorts. Ultimately, this effort can lead to new insights that can address emerging public health concerns. Digitization of health information will continue to be a reality in both human and veterinary data; our approach serves as first proof-of-concept regarding how these two domains can learn from, and inform, one another.
Funding Stanford University & The Chan Zuckerberg Biohub Investigator Award