PT - JOURNAL ARTICLE AU - Lars Juhl Jensen TI - Text Mining of Disease–Lifestyle Associations to Explain Comorbidities in Electronic Health Registries AID - 10.1101/168211 DP - 2017 Jan 01 TA - bioRxiv PG - 168211 4099 - http://biorxiv.org/content/early/2017/07/25/168211.short 4100 - http://biorxiv.org/content/early/2017/07/25/168211.full AB - Mining of electronic health registries can reveal vast numbers of disease correlations (from hereon referred to as comorbidities for simplicity). However, the underlying causes can be hard to identify, in part because health registries usually do not record important lifestyle factors such as diet, substance consumption, and physical activity. To address this challenge, I developed a text-mining approach that uses dictionaries of diseases and lifestyle factors for named entity recognition and subsequently for co-occurrence extraction of disease–lifestyle associations from Medline. I show that this approach is able to extract many correct associations and provide proof-of-concept that these can provide plausible explanations for comorbidities observed in Swedish and Danish health registry data.