RT Journal Article SR Electronic T1 Machine Learning Maps Research Needs in COVID-19 Literature JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.06.11.145425 DO 10.1101/2020.06.11.145425 A1 Anhvinh Doanvo A1 Xiaolu Qian A1 Divya Ramjee A1 Helen Piontkivska A1 Angel Desai A1 Maimuna Majumder YR 2020 UL http://biorxiv.org/content/early/2020/06/12/2020.06.11.145425.abstract AB Summary Manually assessing the scope of the thousands of publications on the COVID-19 (coronavirus disease 2019) pandemic is an overwhelming task. Shortcuts through metadata analysis (e.g., keywords) assume that studies are properly tagged. However, machine learning approaches can rapidly survey the actual text of coronavirus abstracts to identify research overlap between COVID-19 and other coronavirus diseases, research hotspots, and areas warranting exploration. We propose a fast, scalable, and reusable framework to parse novel disease literature. When applied to the COVID-19 Open Research Dataset (CORD-19), dimensionality reduction suggested that COVID-19 studies to date are primarily clinical-, modeling- or field-based, in contrast to the vast quantity of laboratory-driven research for other (non-COVID-19) coronavirus diseases. Topic modeling also indicated that COVID-19 publications have thus far focused primarily on public health, outbreak reporting, clinical care, and testing for coronaviruses, as opposed to the more limited number focused on basic microbiology, including pathogenesis and transmission.Competing Interest StatementThe authors have declared no competing interest.