TY - JOUR T1 - variant2literature: full text literature search for genetic variants JF - bioRxiv DO - 10.1101/583450 SP - 583450 AU - Yin-Hung Lin AU - Yu-Chen Lu AU - Jacob Shujui Hsu AU - Ko-Han Lee AU - Yi-Wei Cheng AU - Yi-Chieh Chen AU - Ting-Fu Chen AU - Jhih-Sheng Fan AU - Chien-Ta Tu AU - Chen-Ming Hsu AU - Chih-Chen Chou AU - Pei-Lung Chen AU - Yi-Chin Ethan Tu AU - Chien-Yu Chen Y1 - 2019/01/01 UR - http://biorxiv.org/content/early/2019/03/26/583450.abstract N2 - Motivation Whole genome sequencing (WGS) by next-generation sequencing produces millions of variants for an individual. The retrieval of biomedical literature for such a large number of genetic variants remains challenging, because in many cases the variants are only present in tables as images, or in the supplementary documents of which the file formats are diverse.Results The proposed tool named variant2literature from the TaiGenomics (Toolkits for AI genomics) resolves the problem by incorporating text recognition with image processing. In addition to the adoption of advanced text retrieval, the recall rate of finding the literature containing the variants of interest is further improved by employing the skill of variant normalization. Different variant presentations are transformed into chromosome coordinates (standard VCF format) such that false negatives can be largely avoided. variant2literature is available in two ways. First, a web-based interface is provided to search all the literature in PMC Open Access Subset. Second, the command-line executable can be downloaded such that the users are free to search all the files in a specified directory locally.Availability http://variant2literature.taigenomics.com/Contact chienyuchen{at}ntu.edu.tw ER -