RT Journal Article SR Electronic T1 Accurate Name Entity Recognition for Biomedical Literatures: A Combined High-quality Manual Annotation and Deep-learning Natural Language Processing Study JF bioRxiv FD Cold Spring Harbor Laboratory SP 2021.09.15.460567 DO 10.1101/2021.09.15.460567 A1 Dao-Ling Huang A1 Quanlei Zeng A1 Yun Xiong A1 Shuixia Liu A1 Chaoqun Pang A1 Menglei Xia A1 Ting Fang A1 Yanli Ma A1 Cuicui Qiang A1 Yi Zhang A1 Yu Zhang A1 Hong Li A1 Yuying Yuan YR 2021 UL http://biorxiv.org/content/early/2021/09/17/2021.09.15.460567.abstract AB A combined high-quality manual annotation and deep-learning natural language processing study is reported to make accurate name entity recognition (NER) for biomedical literatures. A home-made version of entity annotation guidelines on biomedical literatures was constructed. Our manual annotations have an overall over 92% consistency for all the four entity types — gene, variant, disease and species —with the same publicly available annotated corpora from other experts previously. A total of 400 full biomedical articles from PubMed are annotated based on our home-made entity annotation guidelines. Both a BERT-based large model and a DistilBERT-based simplified model were constructed, trained and optimized for offline and online inference, respectively. The F1-scores of NER of gene, variant, disease and species for the BERT-based model are 97.28%, 93.52%, 92.54% and 95.76%, respectively, while those for the DistilBERT-based model are 95.14%, 86.26%, 91.37% and 89.92%, respectively. The F1 scores of the DistilBERT-based NER model retains 97.8%, 92.2%, 98.7% and 93.9% of those of BERT-based NER for gene, variant, disease and species, respectively. Moreover, the performance for both our BERT-based NER model and DistilBERT-based NER model outperforms that of the state-of-art model—BioBERT, indicating the significance to train an NER model on biomedical-domain literatures jointly with high-quality annotated datasets.Competing Interest StatementThe authors have declared no competing interest.