TY - JOUR T1 - Text Mining of Disease–Lifestyle Associations to Explain Comorbidities in Electronic Health Registries JF - bioRxiv DO - 10.1101/168211 SP - 168211 AU - Lars Juhl Jensen Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/07/25/168211.abstract N2 - Mining of electronic health registries can reveal vast numbers of disease correlations (from hereon referred to as comorbidities for simplicity). However, the underlying causes can be hard to identify, in part because health registries usually do not record important lifestyle factors such as diet, substance consumption, and physical activity. To address this challenge, I developed a text-mining approach that uses dictionaries of diseases and lifestyle factors for named entity recognition and subsequently for co-occurrence extraction of disease–lifestyle associations from Medline. I show that this approach is able to extract many correct associations and provide proof-of-concept that these can provide plausible explanations for comorbidities observed in Swedish and Danish health registry data. ER -