PT - JOURNAL ARTICLE AU - Gandy, Lisa AU - Gumm, Jordan AU - Fertig, Benjamin AU - Kennish, Michael J. AU - Chavan, Sameer AU - Thessen, Ann AU - Marchionni, Luigi AU - Xia, Xiaoxan AU - Shankrit, Shambhavi AU - Fertig, Elana J TI - Synthesizer: Expediting synthesis studies from context-free data with natural language processing AID - 10.1101/053629 DP - 2016 Jan 01 TA - bioRxiv PG - 053629 4099 - http://biorxiv.org/content/early/2016/05/16/053629.short 4100 - http://biorxiv.org/content/early/2016/05/16/053629.full AB - Today’s low cost digital data provides unprecedented opportunities for scientific discovery from synthesis studies. For example, the medical field is revolutionizing patient care by creating personalized treatment plans based upon mining electronic medical records, imaging, and genomics data. Standardized annotations are essential to subsequent analyses for synthesis studies. However, accurately combining records from diverse studies requires tedious and error-prone human curation, posing a significant barrier to synthesis studies. We propose a novel natural language processing (NLP) algorithm, Synthesize, to merge data annotations automatically. Application to patient characteristics for diverse human cancers and ecological datasets demonstrates the accuracy of Synthesize in diverse scientific disciplines. This NLP approach is implemented in an open-source software package, Synthesizer. Synthesizer is a generalized, user-friendly system for error-free data merging.