RT Journal Article SR Electronic T1 Uncovering Medical Insights from Vast Amounts of Biomedical Data in Clinical Case Reports JF bioRxiv FD Cold Spring Harbor Laboratory SP 172460 DO 10.1101/172460 A1 Yijiang Zhou A1 David A. Liem A1 Jessica M. Lee A1 Quan Cao A1 Brian Bleakley A1 J. Harry Caufield A1 Sanjana Murali A1 Wei Wang A1 Li Zhang A1 Alex Bui A1 Yizhou Sun A1 Karol E. Watson A1 Jiawei Han A1 Peipei Ping YR 2017 UL http://biorxiv.org/content/early/2017/08/04/172460.1.abstract AB Clinical case reports (CCRs) have a time-honored tradition in serving as an important means of sharing clinical experiences on patients presenting with atypical disease phenotypes or receiving new therapies. However, the huge amount of accumulated case reports are isolated, unstructured, and heterogeneous clinical data, posing a great challenge to clinicians and researchers in mining relevant information through existing indexing tools. In this investigation, in order to render CCRs more findable, accessible, interoperable, and reusable (FAIR) by the biomedical community, we created a resource platform, including the construction of a test dataset consisting of 1000 CCRs spanning 14 disease phenotypes, a standardized metadata template and metrics, and a set of computational tools to automatically retrieve relevant medical information and to analyze all published PubMed clinical case reports with respect to trends in publication journals, citations impact, MeSH Terms, drug use, distributions of patient demographics, and relationships with other case reports and databases. Our standardized metadata template and CCR test dataset may be valuable resources to advance medical science and improve patient care for researchers who are using machine learning approaches with a high-quality dataset to train and validate their algorithms. In the future, our analytical tools may be applied towards other large clinical data sources as well.