Abstract
Purpose High-throughput sequencing has revolutionized genetic disorder diagnosis, but variant pathogenicity interpretation is still challenging. Even though the Human Genome Variation Society (HGVS) provides recommendations for variant nomenclature, discrepancies in annotation remain a significant hurdle.
Methods This study evaluated the annotation concordance between three tools— ANNOVAR, SnpEff, and Variant Effect Predictor (VEP)—using 164,549 two-star variants from ClinVar. The analysis used HGVS nomenclature string-match comparisons to assess annotation consistency from each tool, corresponding coding impacts, and associated ACMG criteria inferred from the annotations.
Results The analysis revealed variable concordance rates, with 58.52% agreement for HGVSc, 84.04% for HGVSp, and 85.58% for the coding impact. SnpEff showed the highest match for HGVSc (0.988), while VEP bettered for HGVSp (0.977). The substantial discrepancies were noted in the Loss-of-Function (LoF) category. Incorrect PVS1 interpretations affected the final pathogenicity and downgraded PLP variants (ANNOVAR 55.9%, SnpEff 66.5%, VEP 67.3%), risking false negatives of clinically relevant variants in reports.
Conclusions These findings highlight the critical challenges in accurately interpreting variant pathogenicity due to discrepancies in annotations. To enhance the reliability of genetic variant interpretation in clinical practice, standardizing transcript sets and systematically cross-validating results across multiple annotation tools is essential.
This study examined the consistency of variant annotations produced by three widely used open-source tools—ANNOVAR, SnpEff, and VEP—against 164,549 ClinVar two starts variants. The investigation covers HGVS-based transcript, protein nomenclature and coding impact annotation. The results showed that none of the tools were fully consistent with ClinVar across all coding impact categories, particularly in the LoF category, which exhibited the poorest consistency. This inconsistency may lead to discrepancies in PVS1 interpretation, affecting the final pathogenicity assessment. PVS1 loss resulted in a significant downgrading of PLP variants, potentially leading to the omission of clinically relevant variants in reports.
Competing Interest Statement
J.H. Huang, Y.B. Wang, and T.H. Yuan are employees of TAIGenomics Co., Ltd., Taiwan, and were involved in the development of the GDK platform, providing technical support for this research. The authors declare no other competing interests.
Abbreviations
- ACMG
- American College of Medical Genetics and Genomics
- AF
- Allele frequency
- AMP
- American Association of Molecular Pathology
- BLB
- Benign and Likely Benign
- Del
- Deletion
- Dup
- Duplication
- GDK
- Gendiseak platform
- HGVS
- Human Genome Variation Society
- HGVSc
- The HGVS coding sequence name
- HGVSp
- The HGVS protein sequence name
- Indel
- Insertion-deletion
- Inv
- Inversion
- Ins
- Insertion
- LOEUF
- The loss-of-function observed/expected upper bound fraction
- LoF
- Loss-of-Function
- MAF
- Minor allele frequency
- MANE
- Matched Annotation from the NCBI and EMBL-EBI
- MC
- Molecular consequence
- MS
- Microsatellite
- MT
- Mitochondria
- MNV
- Multi Nucleotide Variants
- NGS
- Next-Generation Sequencing
- NMD
- Nonsense-Mediated Decay
- Nc
- Non-coding
- PLP
- Pathogenic and Likely Pathogenic
- SO
- Sequence Ontology
- SNV
- Single Nucleotide Variant
- VCF
- Variant Call Format File
- VEP
- Ensembl Variant Effect Predictor
- VUS
- Variant of Uncertain Significance