Abstract
The whole exome sequencing (WES) is a time-consuming technology in the identification of clinical variants and it demands the accurate variant caller tools. The currently available tools compromise accuracy in predicting the specific types of variants. Thus, it is important to find out the possible combination of best aligner-variant caller tools for detecting SNVs and InDels separately. Moreover, many important aspects of InDel detection are not overlooked while comparing the performance of tools. One such aspect is the detection of InDels with respect to base pair length. To assess the performance of variant (especially InDels) caller in combination with different aligners, 20 automated pipelines were developed and evaluated using gold reference variant dataset (NA12878) from Genome in a Bottle (GiaB) consortium of human whole exome sequencing. Additionally, the simulated exome data from two human reference genome sequences (GRCh37 and GRCh38) were used to compare the performance of the pipelines. By analyzing various performance metrices, we observed that BWA and Novoalign aligners performed better with DeepVariant and SAMtools callers for detecting SNVs, and with DeepVariant and GATK for Indels. Altogether, DeepVariant with BWA and Novoalign performed best. Further, we showed that merging the top performing pipelines improved the accurate variant call set. Collectively, this study would help the investigators to effectively improve the sensitivity and accuracy in detecting specific variants.