RT Journal Article
SR Electronic
T1 Machine Learning Maps Research Needs in COVID-19 Literature
JF bioRxiv
FD Cold Spring Harbor Laboratory
SP 2020.06.11.145425
DO 10.1101/2020.06.11.145425
A1 Anhvinh Doanvo
A1 Xiaolu Qian
A1 Divya Ramjee
A1 Helen Piontkivska
A1 Angel Desai
A1 Maimuna Majumder
YR 2020
UL http://biorxiv.org/content/early/2020/06/12/2020.06.11.145425.abstract
AB Summary Manually assessing the scope of the thousands of publications on the COVID-19 (coronavirus disease 2019) pandemic is an overwhelming task. Shortcuts through metadata analysis (e.g., keywords) assume that studies are properly tagged. However, machine learning approaches can rapidly survey the actual text of coronavirus abstracts to identify research overlap between COVID-19 and other coronavirus diseases, research hotspots, and areas warranting exploration. We propose a fast, scalable, and reusable framework to parse novel disease literature. When applied to the COVID-19 Open Research Dataset (CORD-19), dimensionality reduction suggested that COVID-19 studies to date are primarily clinical-, modeling- or field-based, in contrast to the vast quantity of laboratory-driven research for other (non-COVID-19) coronavirus diseases. Topic modeling also indicated that COVID-19 publications have thus far focused primarily on public health, outbreak reporting, clinical care, and testing for coronaviruses, as opposed to the more limited number focused on basic microbiology, including pathogenesis and transmission.Competing Interest StatementThe authors have declared no competing interest.