PT - JOURNAL ARTICLE AU - Chun-chao Lo AU - Shubo Tian AU - Yuchuan Tao AU - Jie Hao AU - Jinfeng Zhang TI - Developing a More Accurate Biomedical Literature Retrieval Method using Deep Learning and Citations in PubMed Central Full-text Articles AID - 10.1101/2021.10.21.465340 DP - 2021 Jan 01 TA - bioRxiv PG - 2021.10.21.465340 4099 - http://biorxiv.org/content/early/2021/10/23/2021.10.21.465340.short 4100 - http://biorxiv.org/content/early/2021/10/23/2021.10.21.465340.full AB - Most queries submitted to a literature search engine can be more precisely written as sentences to give the search engine more specific information. Sentence queries should be more effective, in principle, than short queries with small numbers of keywords. Querying with full sentences is also a key step in question-answering and citation recommendation systems. Despite the considerable progress in natural language processing (NLP) in recent years, using sentence queries on current search engines does not yield satisfactory results. In this study, we developed a deep learning-based method for sentence queries, called DeepSenSe, using citation data available in full-text articles obtained from PubMed Central (PMC). A large amount of labeled data was generated from millions of matched citing sentences and cited articles, making it possible to train quality predictive models using modern deep learning techniques. A two-stage approach was designed: in the first stage we used a modified BM25 algorithm to obtain the top 1000 relevant articles; the second stage involved re-ranking the relevant articles using DeepSenSe. We tested our method using a large number of sentences extracted from real scientific articles in PMC. Our method performed substantially better than PubMed and Google Scholar for sentence queries.Competing Interest StatementThe authors have declared no competing interest.