RT Journal Article SR Electronic T1 Pangaea: A modular and extensible collection of tools for mining context dependent gene relationships from the biomedical literature JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.04.02.022517 DO 10.1101/2020.04.02.022517 A1 Pirvan, Liviu A1 Samarajiwa, Shamith A. YR 2020 UL http://biorxiv.org/content/early/2020/04/03/2020.04.02.022517.abstract AB Motivation Pangaea is a scalable and extensible command line interface (CLI) software that integrates gene-relationship detection features to extract context-dependent structured gene-gene and gene-term relationships from the biomedical literature. It provides computational methods to identify biological relationships between a collection of genes and can be used to search and extract different types of contextual relationships amongst genes.Results We implemented a CLI-based software for downloading PubMed articles and extracting gene relationships from abstracts using natural language processing methods. In terms of scalability, the software was designed to support the retrieval and processing of millions of articles whilst minimising memory requirements and optimising for parallel processing on multiple CPU cores. To allow extensibility, the tool permits the use of contextual custom-made models for the text processing parts, and the output is serialised as JSON objects to allow flexible post-processing workflows.Availability The software is available online at: https://github.com/ss-lab-cancerunit/pangaea