Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Towards automation of germline variant curation in clinical cancer genetics

View ORCID ProfileVignesh Ravichandran, Zarina Shameer, Yelena Kernel, Michael Walsh, Karen Cadoo, Steven Lipkin, Diana Mandelker, Liying Zhang, Zsofia Stadler, Mark Robson, Kenneth Offit, View ORCID ProfileJoseph Vijai
doi: https://doi.org/10.1101/295865
Vignesh Ravichandran
Niehaus Center For Inherited Cancer Genomics, Memorial Sloan Kettering Cancer CenterClinical Genetics Service, Department of Medicine, Memorial Sloan Kettering Cancer Center
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Vignesh Ravichandran
Zarina Shameer
Niehaus Center For Inherited Cancer Genomics, Memorial Sloan Kettering Cancer Center
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yelena Kernel
Niehaus Center For Inherited Cancer Genomics, Memorial Sloan Kettering Cancer CenterClinical Genetics Service, Department of Medicine, Memorial Sloan Kettering Cancer Center
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael Walsh
Clinical Genetics Service, Department of Medicine, Memorial Sloan Kettering Cancer Center
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Karen Cadoo
Clinical Genetics Service, Department of Medicine, Memorial Sloan Kettering Cancer Center
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Steven Lipkin
Weill Cornell Medical College, New York, N.Y., 10065
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Diana Mandelker
Diagnostic Molecular Pathology, Memorial Sloan Kettering Cancer Center
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Liying Zhang
Diagnostic Molecular Pathology, Memorial Sloan Kettering Cancer Center
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zsofia Stadler
Clinical Genetics Service, Department of Medicine, Memorial Sloan Kettering Cancer Center
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mark Robson
Niehaus Center For Inherited Cancer Genomics, Memorial Sloan Kettering Cancer CenterClinical Genetics Service, Department of Medicine, Memorial Sloan Kettering Cancer CenterWeill Cornell Medical College, New York, N.Y., 10065Breast Medicine Service, Department of Medicine, Memorial Sloan Kettering Cancer Center
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kenneth Offit
Niehaus Center For Inherited Cancer Genomics, Memorial Sloan Kettering Cancer CenterClinical Genetics Service, Department of Medicine, Memorial Sloan Kettering Cancer CenterWeill Cornell Medical College, New York, N.Y., 10065Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, N.Y., 10065
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Joseph Vijai
Niehaus Center For Inherited Cancer Genomics, Memorial Sloan Kettering Cancer CenterClinical Genetics Service, Department of Medicine, Memorial Sloan Kettering Cancer CenterWeill Cornell Medical College, New York, N.Y., 10065
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Joseph Vijai
  • For correspondence: josephv@mskcc.org
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Cancer care professionals are confronted with interpreting results from multiplexed gene sequencing of patients at hereditary risk for cancer. Assessments for variant classification now require orthogonal data searches, requiring aggregation of multiple lines of evidence from diverse resources. The burden of evidence for each variant to meet thresholds for pathogenicity or actionability now poses a growing challenge for those seeking to counsel patients and families following germline genetic testing. A computational algorithm that automates, provides uniformity and significantly accelerates this interpretive process is needed. The tool described here, Pathogenicity of Mutation Analyzer (PathoMAN) automates germline genomic variant curation from clinical sequencing based on ACMG guidelines. PathoMAN aggregates multiple tracks of genomic, protein and disease specific information from public sources. We compared expert manually curated variant data from studies on (i) prostate cancer (ii) breast cancer and (iii) ClinVar to assess performance. PathoMAN achieves high concordance (83.1% pathogenic, 75.5% benign) and negligible discordance (0.04% pathogenic, 0.9% benign) when contrasted against expert curation. Some loss of resolution (8.6% pathogenic, 23.64% benign) and gain of resolution (6.6% pathogenic, 1.6% benign) was also observed. We highlight the advantages and weaknesses related to the programmable automation of variant classification. We also propose a new nosology for the five ACMG classes to facilitate more accurate reporting to ClinVar. The proposed refinements will enhance utility of ClinVar to allow further automation in cancer genetics. PathoMAN will reduce the manual workload of domain level experts. It provides a substantial advance in rapid classification of genetic variants by generating robust models using a knowledge-base of diverse genetic data https://pathoman.mskcc.org.

Introduction

The uptake of genetic testing and targeted resequencing of cancer susceptibility genes to facilitate precision cancer prevention and early diagnosis has grown exponentially in concert with the decreasing costs of next generation sequencing (NGS)1;2. A major challenge is the interpretation of sequence variants which result from clinical sequencing. American College of Medical Genetics and Genomics (ACMG) and the Association of Molecular Pathology (AMP) have published guidelines on interpretation of germline variants taking into account not only their pathogenicity but also their clinical actionability3. Nonetheless, germline variant classification continues to pose an immense burden on the time and resources of diagnostic molecular labs and cancer care professionals. The ACMG classification schema requires manually exploring multiple lines of public data, other orthogonal data sources and from the literature; then aggregation and scoring to provide evidence for classifying variants4. There is currently no automated computational framework for classifying genetic variants that is widely available. We developed PathoMAN, a computational resource that automates this classification with uniformity, transparency and speed, in order to facilitate variant curation for the cancer genetics community.

PathoMAN is a Python based variant curation algorithm that classifies germline genomic variants that are detected in the context of clinical cancer genetics sequencing. The schema is inspired by ACMG/AMP classification3. It aggregates multiple tracks of genetic and molecular evidences using variant annotators and from public repositories containing evidence for pathogenicity assertion. The compiled data is then used in 28 distinct categories addressing the type of the mutation, its biological impact, in silico predictions, presence in the control cohort as well as the familial information and inheritance pattern of the mutation. The aggregate score resulting from these categories is used in classifying the variant as Pathogenic (P), Likely-Pathogenic (LP), Benign (B), Likely-Benign (LB) or Variant of Uncertain Significance (VUS).

The performance of PathoMAN was measured by re-evaluating expertly curated germline cancer variants in cancer genes compiled from two large published studies on breast and prostate cancer. In this study, we also assess the frequency of clinically actionable mutations present in general population using ExAC data in cancer susceptible and cancer predisposing genes as well as in ClinVar. We test the application of ACMG criteria for germline cancer variants and address the bottleneck of variant curation in using automated algorithms in variant classification.

Materials and Methods

Test datasets

We tested 11,196 germline variants in 76 genes of highest cancer relevance to describe and compare performance of the algorithm in cancer related genomic variant data. Clinically actionable variants (P/LP) in these 76 genes are reported back to patients at MSKCC as part of the IMPACT® assay using the appropriate IRB approved protocol5;6. Several of the genes in this set are strong cancer predisposition genes, while others are putative candidate genes across one or more syndromic cancers. We have selected this gene list, referred as IMPACT-767,(Suppl Table 1) uniformly throughout this manuscript. We chose exonic and essential splice site (+/− 1,2) variants in the IMPACT-76 genes from three manually curated datasets - (i) prostate cancer study8; (ii) breast cancer study9 and (iii) ClinVar10;11. The germline variants in the cancer datasets were curated by experts using the ACMG classification guidelines. Some adjustments or manual overrides were performed based on the interpretation of the variant in a disease and literature evidences9. It should be also be noted, that in the breast dataset analysis, the exome sequencing project (EVS6500) and 1000 genomes for population allele frequencies and the ClinVar version available in early 2016 were used.

The Exome Aggregation Consortium12(ExAC) is a joint effort to aggregate exome sequencing data from fourteen large sequencing projects to provide summary data such as ethnicity specific allele frequency for a wider scientific community. The ExAC-noTCGA data is a subset of 53,105 samples and it doesn’t include 7601 The Cancer Genome Atlas (TCGA) cancer germline samples. The variants in test data were compared against ExAC-noTCGA Non-Finnish European population and were considered not to have de novo evidence and co-segregation, as this information was unavailable for these datasets. However, PathoMAN can use such information, if available to assign ACMG classes. The results were compared against the manual curation. They are reported here in four categories: concordance, discordance, loss of resolution and gain of resolution.

When the reported P/LP and B/LB variants are re-classified as P/LP and B/LB respectively by PathoMAN, then the results are considered concordant. Similarly when reported P/LP and B/LB variants are re-classified as B/LB and P/LP by PathoMAN respectively, then the variants are considered discordant. When reported P/LP or B/LB variants are re-classified as VUS by PathoMAN, then they are placed in the loss of resolution (LOR) category and when the reported VUS are re-classified as P/LP or B/LB, then are considered as gain of resolution (GOR) category (Table 1). As we describe below, these evaluations aid in understanding the real-world usage of the eight ACMG categories of evidence in cancer genomics, and in their ability to discriminate between P/LP, B/LB and VUS.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 1:

Rating the comparison of results between PathoMAN variant curation and domain expert-curation of germline variants.

ExAC subset of IMPACT-76

The ExAC12 is a public resource often utilized as convenience controls for several human traits in case-control studies. We wanted to estimate the burden of variants in ExAC-noTCGA as classified by PathoMAN and contrast against known information in ClinVar. We selected 55,566 variants from IMPACT-76 genes which were in exonic or essential splice site regions. This is not considered part of the test datasets descibed earlier, as ExAC data is used as part of the ACMG criteria PS4, PM2, BA1, BS1 and BS2.

ACMG/AMP guidelines

The 2015 ACMG/AMP guidelines consist of 16 criteria that aid in classifiying pathogenicity and 12 criteria that aid for benignity. Pathogenicity criteria were broadly grouped as: very strong evidence (PVS1), strong evidence (PS1-PS4), moderate evidence (PM1 – PM5) and supporting evidence (PP1-PP5). Benign criteria are broadly grouped as: standalone evidence (BA1), strong evidence (BS1-BS4) and supporting evidence (BP1-BP7). This classification system resolved a variant as pathogenic or benign based on eight components – population frequency data, genomic annotation and computational predictive data, functional data, segregation data, de novo data, allelic/genotypic data, public databases and literature and other data (Table 2). The variant classification criteria used by PathoMAN were inspired by these ACMG/AMP guidelines, although they do not precisely adhere to these published norms. We describe below the criteria and their modifications for variant classification by PathoMAN for cancer genetics.

Variant Annotation

PathoMAN makes use of CAVA13 for genomic annotation, dbNSFP14–16 using Annovar17 for in silico predictions, ExAC-noTCGA and gnomAD12 for public control frequencies, ClinVar10,11 for public evidences, and curated list of variants from the literature for functional evidences.

Determination of PVS1: null variants

Curated lists of cancer-causing genes from the literature, various genetic testing panels,18–22 and OMIM genes that causes autosomal dominant disease23 were aggregated. If the variant in a gene from this list was a Tier 1 mutation (frameshift, truncating, essential splice variant and initiation codon), and not present in thelast exon, then PVS1 was scored 1. A gene with a functional domain encoded by the last exon, such as ATM, was an exception to the last exon criteria24. PVS1 was not scored for BRCA2 mutations observed after the polymorphic stop rs11571833 (K3326X).

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 2:

List of data sources and annotators used in building PathoMAN’s knowledge-base

Determination of PS1 and PM5: known pathogenic missense

If a missense variant was reported as pathogenic by multiple submitters with no conflicts and had a gold star of 2 or more in ClinVar25,10,11, irrespective of the alternative allele but leading to the same amino acid change, then PS1 was scored 1. If a missense variant was not seen in ClinVar but had another pathogenic missense variant at the same amino acid with a different amino acid change, then PM5 was coded 1.

Determination of PS3 or BS3: strong prior evidence of pathogenic or benign

An aggregated select list of reported pathogenic and benign variants from the literature2;26-29 were used as a knowledge-base for PS3/BS3. If the variants were in the curated list (missense variants in BRCA1/2 reported by ENIGMA), or if it was a truncating variant and ClinVar had reported it as pathogenic or benign with a gold star 2 or more, then PS3 or BS3 was coded 1. Our selective use of ClinVar assigns higher confidence for the truncating variants and select missense variants reported by domain experts that are either pathogenic or benign. We also include published saturation editing experimental evidence for BRCA1.

Determination of PS4 PM2 BA1 BS1 and BS2: rarity and enrichment of variant in cases

If a variant was present in aggregated public controls such as the ExAC-noTCGA dataset and gnomAD12 with an allele frequency greater than 5%, then the variant was coded 1 for BA1. If the variant had an allele frequency in public controls between 1% and 5% then BS1 was coded 1. If the variant was also present in a homozygous form in the public controls, then BS2 was coded 1. If the variant was absent from ExAC-noTCGA data or gnomAD general population data, and then PM2 was coded 1. For variants, not scored as BA1, BS1, BS2 or PM2, Fishers Exact test was performed against the user defined population (ExAC-noTCGA or gnomAD). The population included all the major groups (NFE, FIN, SAS, AMR, AFR, EAS) in ExAC and ASJ population in the gnomAD database. If the odds ratio was greater than 3 and p-value less than 0.05, then the variant was given a score of 1 for PS4. This was a robust measure for weighting pathogenicity in uncommon variants and non-singletons.

Determination of PM1: membership in a protein domain of functional significance

If the amino acid that was being altered by the mutation was present in a protein domain, or a residue involved in signalling, binding with other proteins, or in an active site, then PM1 was coded 1. Currently, we use Uniprot for annotation of protein features30. We acknowledge the incremental value of a curated somatic hotspot list (http://cancerhotspots.org) to aid in this classification.

Determination of PM4 BP3: genomic complexity and context of the variant

If the mutation was an in-frame insertion/deletion or a stop loss in a non-repetitive region, then the variant was coded 1 for PM4. Instead, if it was an in-frame insertion/deletion in a repetitive region, then BP3 was coded 1. The repeat masker track from UCSC genome browser was used for this31–34 criteria.

Determination of PP3 BP4: In silico prediction of deleteriousness

We used Annovar17to annotate the variants with dbNSFP14-16,35 track to get results of deleteriousness predictions from 12 in silico algorithms – CADD36, FATHMM37, LRT38, MutationAssessor39, MutationTaster40, PROVEAN41, Polyphen2-HDIV42, Polyphen2-HVAR42, RadialSVM35, SIFT43, VEST344and M-CAP45. Use of an ensemble of in silico prediction algorithms improves prediction across a wide range of genes and cancer types46. Hence if more than 7 (>50%) algorithms call a variant deleterious, then the variant was coded 1 for PP3. Otherwise, BP4 was coded 1. In contrast, many of the old ClinVar records relied on only SIFT or Polyphen.

Determination of PP5 BP6: Known variant with insufficient details

If the variant was in ClinVar10,11 with gold star less than 2 and was pathogenic, then PP5 was scored 1. If it’s benign, BP6 was scored 1. Thus, a variant reported once in ClinVar does not command high value in the pathogenicity determination, but can be upgraded depending on other ancillary information tagged to it.

Determination of BP7: synonymous variants

For synonymous silent mutations, we used adaptive boosting and random forest scores from dbscSNV47, which if it was less than 0.6, then BP7 was scored 1. dbscSNV is a database of precomputed prediction scores for SNVs, that may occur in splice consensus regions. Higher scores reflect the variants effect in splicing47.

Determination of PP2 and BP1: missense driven disease genes

We used ClinVar10; 11 to collect all reported missense variants per gene. We then selected the confident (gold star 2 or more) pathogenic and benign calls. The list of genes with higher ratio of pathogenic to benign variants called missense-driven pathogenic genes and the list of genes with lower ratio of pathogenic to benign variants were called missense-driven benign genes. Any missense variant in the missense-driven pathogenic gene list was scored PP2 and any missense variant in the missense -driven benign gene list was scored BP1. Classic example of such genes are PTEN49 and TP5350. Almost all PTEN missense mutations were pathogenic (Supp Figure 1).

Determination of PS2, PM6, PP1 and BS4: denovo and cosegregation

ACMG criteria require both paternity and maternity confirmed for de novo variants. PathoMAN requires user input for de novo status and segregation information for classification. Three options for de novo evidence includes, de novo with both paternity and maternity confirmed, de novo without paternity or maternity confirmed and no de novo evidence at al. Similarly, for cosegregation evidence, the options provided were co-segregation with the disease, lack of co-segregation with the disease and no co-segregation. For larger trio studies, in the future, we expect to include a module that looks for de novo variants computationally from a mutisample VCF and pedigree file information.

Determination of PM3, BP2: recessive inheritance

These two categories apply to variants with a recessive disease. Germline cancer variants are generally associated with cancer predisposition syndromes in an autosomal dominant inheritance pattern. Hence, PM3 and BP2 doesn’t apply for current PathoMAN variant classification. They were scored 0. We expect to add compound heterozygosity to the next version of the algorithm. We also will incorporate select gene variants in mismatch repair genes that can be classified by this category into the knowledge-base.

Determination of PP4, BP5: disease specific conditions

PP4 and BP5, per ACMG were two criteria to consider in a patient’s predisposition to a specific disease with single gene aetiology. Cancer is a disease with multi-gene aetiology although certain genes such as RB1 may be strong candidates for PP4. Once we incorporate a pedigree file, and cancer phenotype variables, we should be able to apply these criteria. In the current iteration, these were scored 0.

The final classification schema was based on the original ACMG scoring and pathogenicity was predicted for the datasets described.

Usage of ClinVar (Release 12/28/2017) in PathoMAN

ClinVar is the most popular database that processes submissions of human DNA variants and assertions made regarding their clinical significance. One of its stated goals is to support computational re-evaluation of genotypes and assertions. ClinVar currently does not provide detailed information on pathogenicity assertions, unlike another initiative ClinGen51; 52, which aims to build a more discriminatory and accurate assessment of variant pathogenicity. However, access to the ClinGen interface is at the moment restricted to a smaller niche group. PathoMAN uses ClinVar as one of its knowledge-base. We use the ClinVar data parser tool53 to extract important fields that will aid in variant classification. PathoMAN takes advantage of the gold star system awarded to variants with highest evidences supporting the assertion of clinical significance. PathoMAN up-weights variants with gold star 2 or more in the knowledge-base. We selected variants with gold star < 2 and with the term “cancer” in traits field and used them as another test dataset. This test data set was filtered for only variants in the exonic and essential splice regions in IMPACT-7654 genes (n=9164) variants (Supp Table 1).

Results

PathoMAN versus manual curation for test datasets

PathoMAN is a variant classification algorithm, provided as a web based service, that either allows user to query single variants or upload a file with a batch of variants. The web-tool is created using Flask web framework. Single variant query works on chromosome, position, reference allele, alternative allele, allele count, allele number, de novo status, co-segregation status and preferred control population sub-group. Batch upload requires six columns that follow VCF 4.2 format [chr, pos, ref, alt, ac, an]. User can select de novo status, co-segregation status and preferred control population sub group. Our pipeline is currently equipped to annotate a minimal VCF and prepare it for PathoMAN variant classification. The result of a single variant query is displayed back on to the web-page while the batch upload results will be mailed back to the registered user. For an annotated VCF file containing 1000 variants, PathoMAN takes 6 minutes, which is 3.6 seconds per variant. This provides a massive advantage in terms of speed, uniformity, efficiency and service assurance than manual curation.

Overall the test dataset contained 11,196 variants in 76 genes with reported expert curation from three groups–(i) prostate cancer study8; (ii) breast cancer study9and (iii) Clin Var10,11. Missense mutations account for 71% of the variants, Frameshift variants 14%, Stop gain variants 5% and the rest distributed among splice variants, in-frame insertion/deletion and stop loss in the test dataset (Figure 1A, Table 3). We annotated the variants with CAVA, Annovar, ExAC noTCGA, gnomAD and ClinVar and prepared the VCFs for PathoMAN.

Figure 1A:
  • Download figure
  • Open in new tab
Figure 1A: Distribution of mutations in test data by variant class.

(Stop loss - SL; Start codon alteration - IM; Splice variant that alters +5 splice site - SS5; Inframe ins/del - IF; Alters first 3 bases of codon - EE; Essential splice site +/− 1, 2 - ESS; Stop gain - SG; Frameshift - FS; Non-synonymous - NSY). Figure 1B: Overall comparison results of PathoMAN variant curation and expert curation for germline variants from test dataset. Figure 1C: PathoMAN results for reported pathogenic and likely pathogenic germline variants in test dataset. Figure 1D: PathoMAN results for reported benign and likely benign germline variants in test dataset.

PathoMAN achieves an overall concordance of 83.1% for P/LP variants and 75.5% for B/LB variants. A minimal discordance of 0.04% (P/LP) and 0.9% (B/LB) and a LOR of 8.6% (P/LP) and 23.6% (B/LB) were observed. PathoMAN achieves GOR by resolving 6.6% of the VUS as P/LP and 1.59% as B/LB (Figure 1B-D). We estimated reliability of recall using the Cohen’s kappa coefficient (K) between the manual curation and PathoMAN classification for the test dataset. The number of observed agreements was 89.2% (9988 unique variants) (K = 0.74, CI 95% 0.73-0.75).

Out of the 2,535 P/LP variants reported by manual curation, PathoMAN showed very high concordance for frameshift, essential splice sites and truncating variants. PathoMAN failed to resolve reported P/LP for 233 missense variants (Table 4A). All of these were low confidence variants in the ClinVar dataset with no assertion criteria provided (gold star=0) or variants with a single submitter (gold star=1). Similarly for reported benign variants (n=440), we observe high concordance with the exception of missense variant class. LOR is observed for 90 missense variants with conflicting reports in ClinVar (Table 4B). Many rare variants that are reported from clinical sequencing/studies have no public records documenting their pathogenicity classification and there may be insufficient evidence to call them either pathogenic or benign. These variants have been called VUS by the manual curators (n=8221 variants) at the time. PathoMAN is able to resolve 6.7% of the VUS as LP and 1.6% as LB (Table 4C).

PathoMAN results for ExAC (no TCGA) dataset

We asked if we could predict the different classes of ACMG mutations in this public resource using PathoMAN. We selected 55566 exonic and essential splice variants from IMPACT-76 genes and classified them. Overall, PathoMAN calls <1% of the heterozygous genotypes in ExAC noTCGA dataset, from 53,105 samples, as P/LP. Further we tabulated pathogenic variant burden by genes and compared them against ClinVar (Table 5). The results show, PathoMAN calls a similar number of variants reported in ClinVar for the bona_fide cancer genes like BRCA1 and BRCA2. PathoMAN also predicts a few rare variants as P/LP in these genes which have not been reported previously in ClinVar. Investigators who intend to use the ExAC noTCGA dataset as controls in cancer sequencing studies can use PathoMAN to get a rapid count of variants across genes above and beyond those reported in ClinVar.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 4: PathoMAN re-classification of expertly curated germline cancer variant reported in the test datasets.

All tables show distribution by variant classes. Table A shows only reported P/LP variants; Table B shows only reported B/LB; and Table C shows only Reported VUS.

Usage of ACMG/AMP categories in PathoMAN

We analyzed the real-world usage of the eight categories of evidence (population frequency, genomic annotation and computational prediction, functional evidence, co-segregation, de novo status, allelic/genotypic data, public databases, scientific literature and other data) used in the ACMG/AMP guidelines. Interestingly, we find that the categories: population frequency data, genomic annotation and computational predictions, databases and scientific literature (Figure 2) are the most used. These are available due to generous data and tool-kit sharing policies in the genomics field. The categories that are rarely if ever used are familial cosegregation data or de novo status, allelic data, and functional data. The co-segregation data and de novo status data are limited to familial studies, and are mostly unavailable in sporadic case-control settings since these are collected by investigators, doctors, genetic counsellors and commercial labs based on patient input. For a variant to be classified as pathogenic or likely pathogenic by ACMG criteria, one needs a maximum of 1 PVS1 or 2 PSs or 3 PMs or 4 PPs for which, the knowledge-base and resources used by PathoMAN were demonstrably sufficient. We describe below the bottlenecks in sharing this information and propose a novel framework to circumvent and ameliorate these issues.

Figure 2:
  • Download figure
  • Open in new tab
Figure 2:

Utilization of knowledgebase components by PathoMAN during variant curation of the test datasets.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 5: Comparison of pathogenic gene burden in ExACnoTC-GA between ClinVar and PathoMAN.

Columns contain variant counts.

Discussion

PathoMAN as a tool to aid variant curation

Traditionally, genetic variant curation has been performed manually by expert groups of individuals. However this is a time intensive task that requires aggregation and interpretation of information from multiple sources. In the cancer realm, this was relatively easy at times when only a single gene such as BRCA1/2 was under investigation. In contemporary testing scenarios which routinely rely on multiplex gene-panels, this task is onerous. Large gene discovery efforts, as well as clinical reporting, could use a simplified, automated, method for prioritizing variants for a closer look or in the best case, be useful as the classification tool of choice. PathoMAN addresses this critical unmet need for an unbiased algorithmic approach towards classifying genetic variants of clinical interest in cancer predisposition. PathoMAN can be easily accessed through a web browser and results for individual variants are almost immediately available, while batch uploads of variant VCFs may take a few hours.

Genetic testing labs have started utilizing the ACMG/AMP classification rules to classify variants for pathogenicity within cancer predisposition genes. However, results vary depending on availability of accessible data and interpretational differences55. Efforts are being made to resolve interpretational differences through initiatives underway such as ClinGen. In a recent report56, 13% of variants in ClinVar were re-analyzed, and were found to be unresolved, underscoring the difficulties even for expert curator groups. In that study56, clinical intepretations from four clinical laboratories were concordant for 91.5% of shared variants in ClinVar after consulations. For manual or automated curation the minimal set of information required to classify a variant as likely pathogenic or likely benign are: population frequency, computational predictors and evidences from public databases56. PathoMAN compiles this information uniformly in a machine accessible format which is used as a knowledge-base for variant classification. An advantage of using PathoMAN is that it can easily tag benign variants based on public allele frequency and the genomic context information from annotators. This reduces the variant pool of interest to a manageable subset. In a typical multiplexed gene-panel variant list, after filtering for only rare high or moderate impact variants, PathoMAN will classify about one third of the variants as B/LB with high precision. This saves time and effort for the variant curators and helps them to focus on curating the remaining potentially actionable variants. PathoMAN can also tag founder mutations.

Cancer is a complex disease with multi-gene aetiology. Some cancer genes confer high risk whereas some only moderately affect the carrier’s risk. Panel testing is currently used for active surveillance and intervention to lower disease risk. Large sequencing and genotyping efforts to discover new cancer predisposition genes are being carried out by several consortia like BCAC57, SIMPLEXO58; COMPLEXO59, CIMBA60, etc. Several commercial and academic labs also now offer multiplexed panels. As the cost for sequencing these panels decreases, the number of genes tested in panels is increasing. Automation allows for rapid processing, service assurance and reproducibility of results for these large panels. Gold standard sets of curation pioneered by ClinGen51; 52 would aid in refining these pathogenicity classifications further, while efforts such as the PROMPT29 registry enable accurate penetrance estimates of mutations in susceptibility genes. The PROMPT registry has identified a 26% discordance rate among laboratories and an 11% rate with conflicting interpretations, a discrepancy that has implications for altering medical management.

In the three datasets we used to test PathoMAN, we demonstrate that, when contrasted against an expert curated set of variants in the IMPACT-76 genes, there is a high concordance rate for both pathogenic and benign variants. The concordance for P/LP variants is excellent when limited to truncating variants; frameshift variants and essential splice variants such as those reported in the prostate cancer study8. When missense variants are also considered such as in the breast cancer study9, we see more VUS, but the discordance is still minimal.

Much of the discordance can be traced back to the ClinVar submissions with conflicting interpretations; e.g. BRCA1:c.5348T>C (p.Met1783Thr). Currently, PathoMAN doesn’t have access to proprietary databases (such as HGMD)62,63, which may have additional evidence for pathogenicity or benignity. For splice variants, our source is limited to −3-to+8 at the 5’ splice site and −12-to+2 at the 3’ splice site. Hence, we may be missing out on certain extended regions.

Many labs and certain programs such as cardio classifier64 and InterVar65 use prior knowledge of disease-gene pair association. This is advantageous to reduce classifications leading to P/LP for those genes that are not in a disease-gene pair. However, it also suffers from the disadvantage that it cannot be used for lesser known genes-disease pairs or for novel gene hunting. In a recent report, we showed that, half of the cases, in a series consisting of selected advanced cancers at a single institution, were non-syndromic associations5. Probands or their close relatives had clinically actionable variants in cancer genes not directly associated with the specific cancers for which there were known syndromic associations. PathoMAN does not use the contextual syndromic association in deciphering pathogenicity of variants. However, with the applications envisaged for novel gene discovery, this is a distinct advantage. For clinical sequencing which is more focused on specific sets of genes, is limiting to disease-gene pairs to identify pathogenic variants.

The variants that could not be classified by PathoMAN and are called VUS are due to lack of accessible, supporting evidence for the clinical assertion by using ACMG guidelines. These are classic examples of rare variants absent in ClinVar and ExAC datasets. For these LOR variants, we believe that the expert curators may have had additional evidence form literature, inhouse functional evidence66 or familial co-segregation information67 that helped classify these variants as P/LP or B/LB. The upgrade for VUS to either LP or LB by PathoMAN is based on the three categories - lines of available evidence in public databases, population frequency and computational and in silico prediction on deleteriousness. These variants can be re-classified as either pathogenic or benign if additional functional or co-segregation data become available through literature or initiatives such as ClinGen56.

Commercial testing laboratories have proprietary versions of interpretation pipelines such as Sherloc68(Invitae Corporation) and MyVISION (Myriad Genetics). However, these are unavailable to the community at large. PathoMAN is designed to provide an optimized platform for clinical variant calling utilizing publically available data resources.

Using ACMG for variant classification in Cancer

Mutations in tumor suppressors and oncogenes lead to tumorigenesis, and the Knudson two-hit hypothesis69is seen to operate in many common cancers. Common examples include APC, TP53, BRCA1/2 genes etc. However, several of these genes, especially those that are part of the Fanconi complex (FANCS-BRCA1, FANCD1-BRCA2, FANCJ-BRIP1, FANCN-PALB2, FANCP-SLX4, RAD51C), neurofibromatosis (NF1), Ataxia-telangiectasia (ATM), Bloom syndrome (BLM), Niemegen breakage syndrome (NBN), dyskeratosis congenita (TERT) that lead to autosomal recessive rare Mendelian disorders, are also found to be risk genes for autosomal dominant cancer predisposition. Heterozygous carriers of these gene mutations are reported to have increased risks for syndromic cancers70. Occasionally, gene disrupting heterozygous mutations in these genes that are rare, absent in public controls such as ExAC and gNOMAD may be observed in sequenced cancer cohorts. Their ClinVar record for pathogenicity is usually based on their Mendelian recessive syndrome and not to the cancer phenotypes. Hence, applying the ACMG rules to genes without membership in the ACMG list may be fraught with misclassification. However, we believe that continuing data streams for variants in these genes will lead to better classifications, especially when coupled with familial co-segregation and functional validations. While PathoMAN classifications for such genes are a useful starting point for identifying variants that may be pathogenic, and discarding benign; we emphasise on expert manual curation to disentangle these issue.

Limitations of automation

Automating variant classification based on publically available information has some pitfalls. Supporting evidences provided in ClinVar for variants are not computation friendly and requires manual curation to interpret free text. In several instances, the citations are not relevant to the specific records. Technologies such as natural language processing and tagging will eventually help to build a knowledge-base that can further be used for deep learning.

Current ACMG guidelines do not directly link ClinVar functional evidence provided as supporting observations, which leads to loss of information that could be used in variant classification. Due to this lack of data structure, the variants in ClinVar are scored only PP5 or BP6 and not PS3 or BS3. We employed the gold star 2 or more status as a proxy for functional evidence. Not all clinical sequencing projects are equipped or do independent analyses to assess functional evidences for their clinical assertion. If the ClinVar evidence is coded with proper tags, it would be helpful for molecular geneticists and clinical curators to use this information for their pathogenicity estimation. For example TP53 (R273H), BRCA1 (Y105C) and BRCA1 (V1688del) variants have overwhelming literature evidences (Supp Figure 2); however the evidence present in the description of the submissions within ClinVar, are computationally un-derivable. Similarly there are many variants reported in the literature which may have some level of supporting evidence for pathogenicity or benignity in ClinVar. Currently all of these data integration is done by manual curators on a case-by-case basis.

We propose a framework to report ClinVar data that can be structured and parsable for an automated algorithm in the context of cancer. This format consists of 6 important fields that compress the vast information that is present in literature or clinical reports.

  1. Population/Ethnicity (NFE, AFR, SAS, AMR, ASJ, FIN, OTH, EAS, others)

  2. Inheritance model (AD,AR, de novo, X-linked)

  3. Allelic status (Hom, Het)

  4. Family history/Co-segregation information (Yes-1; No-0)

  5. Disease association (TCGA code/ Oncotree code71)

  6. Functional Evidence (Experiment type: NMC, LOH, etc.)

For example, ERCC3 (R109X) variant72 can be depicted as ASJ-AD-Het-1:1-BRCA,BLCA-NMD. This variant was seen in Ashkenazi Jewish individuals with an autosomal dominant inheritance for the heterozygous allele. This variant cosegregated in one family with cancer history. The variant was found in Breast cancer and Bladder cancer individuals and the functional evidence for pathogenicity was carried out by testing for non-sense mediated decay and other experiments.

Large sequencing studies and gene specific functional studies give curated list of variants with their pathogenic impacts like TP53 database28 and a functional study on PALB2 variants26; 27. As a primer, we have collated a list of PALB2, TP53 variants from the literature as supporting the knowledge-base for PathoMAN but there is a real need to create a publically available well curated list of variants from the literature that is amenable to programmatic interpretation. Similarly, as standards evolve for the incorporation of somatic mutations into germline interpretation, we expect an integration of such events for atleast some tumor suppressor and oncogenes. The roles played by the ENIGMA Consortium73; 74, G4GH75, BRCA-Share76 in this regard are meritorious. Though Clinical laboratories collaborate to resolve the differences in variant interpretations submitted to ClinVar56, the fact remains however, that a unified framework for incorporation of supporting machine readable evidences in any variant database including ClinVar remains a critical bottleneck.

Functional data is rarely available for most genes. Exceptions are BRCA1/2 due to the concerted efforts of the ENIGMA consortium73; 74. In single variant reports, data is usually buried within scientific jargon that is not compatible with genomic variant information. In many instances, functional data is dependent on the models used, e.g. overexpression of a mutant construct, deletion of a region using a CRISPR endonuclease and sometimes, introduction of the specific nucleotide through homology directed DNA repair. It is also likely, that the results from these three methods do not agree. Novel methods to understand deleteriousness using saturation mutagenesis are also starting to emerge77–79 for e.g., BRCA180 that we have incorporated into PathoMAN. We hope these will add a uniform layer of functional data that can be used in determining pathogenicity in the coming years.

In conclusion, we performed pathogenicity assessment of 66,762 variants in germline cancer genes (IMPACT-76), the first and largest uniform classification using an unbiased computational tool. We demonstrate the high concordance and low discordance when compared with manual curation as a harbinger of how such programs will in the near future, be able to work as well as domain experts and manual curators. PathoMAN is a first step towards our goal of automating the complex process of variant classification and interpretation. A beta version of the web app is available at https://pathoman.mskcc.org/

Web -resources

  1. CAVA - https://github.com/RahmanTeam/CAVA

  2. SNPEff - http://snpeff.sourceforge.net/SnpEff_manual.html

  3. Annovar - https://annovar.openbioinformatics.org

  4. ExACnoTCGA - http://exac.broadinstitute.org

  5. gnomAD - http://gnomad.broadinstitute.org/

  6. ClinVar - https://www.ncbi.nlm.nih.gov/clinvar/

  7. IARC database - http://p53.iarc.fr/

  8. ClinVar parser tool - https://github.com/macarthur-lab/clinvar

  9. dbNSfP and dbscSNV - https://sites.google.com/site/jpopgen/dbNSFP

  10. Gene List - https://github.com/macarthur-lab/gene_lists

  11. Repeat masker - http://www.repeatmasker.org/

  12. UCSC Genome Browser - https://genome.ucsc.edu

  13. Cardio classifier - https://www.cardioclassifier.org/

  14. InterVar - https://github.com/WGLab/InterVar

  15. ACMG - https://www.acmg.net/

  16. PathoMAN- http://pathoman.mskcc.org/

Funding/Support

Research reported in this pre-print was supported by National Cancer Insittute of the National Institutes of Health under award number R21CA029533 and R21CA178800 as well as Cycle for Survival, the Breast Cancer Research Foundation and The V Foundation for Cancer Research. It is also supported by the Cancer Center core grant P30CA008748 and The Robert and Kate Niehaus Center for Inherited Cancer Genomics. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or other funding agencies.

Acknowledgements

We thank Sabine Topka PhD, Semanti Mukherjee PhD, Maria Carlo MD and Zoe Steinsynder BS for helpful suggestions to improve the manuscript.

References

  1. 1.↵
    Offit, K. (2017). Multigene Testing for Hereditary Cancer: When, Why, and How. J Natl Compr Canc Netw 15, 741–743.
    OpenUrlAbstract/FREE Full Text
  2. 2.↵
    Desmond, A., Kurian, A.W., Gabree, M., Mills, M.A., Anderson, M.J., Kobayashi, Y., Horick, N., Yang, S., Shannon, K.M., Tung, N., et al. (2015). Clinical Actionability of Multigene Panel Testing for Hereditary Breast and Ovarian Cancer Risk Assessment. JAMA Oncol 1, 943–951.
    OpenUrl
  3. 3.↵
    Richards, S., Aziz, N., Bale, S., Bick, D., Das, S., Gastier-Foster, J., Grody, W.W., Hegde, M., Lyon, E., Spector, E., et al. (2015). Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med 17, 405–424.
    OpenUrlCrossRefPubMed
  4. 4.↵
    Pandey, K.R., Maden, N., Poudel, B., Pradhananga, S., and Sharma, A.K. (2012). The curation of genetic variants: difficulties and possible solutions. Genomics Proteomics Bioinformatics 10, 317–325.
    OpenUrl
  5. 5.↵
    Mandelker, D., Zhang, L., Kemel, Y., Stadler, Z.K., Joseph, V., Zehir, A., Pradhan, N., Arnold, A., Walsh, M.F., Li, Y., et al. (2017). Mutation Detection in Patients With Advanced Cancer by Universal Sequencing of Cancer-Related Genes in Tumor and Normal DNA vs Guideline-Based Germline Testing. JAMA 318, 825–835.
    OpenUrl
  6. 6.↵
    Zehir, A., Benayed, R., Shah, R.H., Syed, A., Middha, S., Kim, H.R., Srinivasan, P., Gao, J., Chakravarty, D., Devlin, S.M., et al. (2017). Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat Med 23, 703–713.
    OpenUrlCrossRefPubMed
  7. 7.↵
    https://www.mskcc.org/press-releases/msk-impact-first-tumor-profiling-multiplex-panel-authorized-fda-setting-new-pathway-market-future-oncopanels
  8. 8.↵
    Pritchard, C.C., Mateo, J., Walsh, M.F., De Sarkar, N., Abida, W., Beltran, H., Garofalo, A., Gulati, R., Carreira, S., Eeles, R., et al. (2016). Inherited DNA-Repair Gene Mutations in Men with Metastatic Prostate Cancer. N Engl J Med 375, 443–453.
    OpenUrlCrossRefPubMed
  9. 9.↵
    Maxwell, K.N., Hart, S.N., Vijai, J., Schrader, K.A., Slavin, T.P., Thomas, T., Wubbenhorst, B., Ravichandran, V., Moore, R.M., Hu, C., et al. (2016). Evaluation of ACMG-Guideline-Based Variant Classification of Cancer Susceptibility and Non-Cancer-Associated Genes in Families Affected by Breast Cancer. Am J Hum Genet 98, 801–817.
    OpenUrlCrossRefPubMed
  10. 10.↵
    Landrum, M.J., Lee, J.M., Benson, M., Brown, G., Chao C., Chitipiralla, S., Gu, B., Hart, J., Hoffman, D., Hoover, J., et al. (2016). ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res 44, D862–868.
    OpenUrlCrossRefPubMed
  11. 11.↵
    Landrum, M.J., Lee, J.M., Riley, G.R., Jang, W., Rubinstein, W.S., Church, D.M., and Maglott, D.R. (2014). ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res 42, D980–985.
    OpenUrlCrossRefPubMedWeb of Science
  12. 12.↵
    Lek, M., Karczewski, K.J., Minikel, E.V., Samocha, K.E., Banks, E., Fennell, T., O’Donnell-Luria, A.H., Ware, J.S., Hill, A.J., Cummings, B.B., et al. (2016). Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291.
    OpenUrlCrossRefPubMed
  13. 13.↵
    Munz, M., Ruark, E., Renwick, A., Ramsay, E., Clarke, M., Mahamdallie, S., Cloke, V., Seal, S., Strydom, A., Lunter, G., et al. (2015). CSN and CAVA: variant annotation tools for rapid, robust next-generation sequencing analysis in the clinical setting. Genome Med 7, 76.
    OpenUrlCrossRefPubMed
  14. 14.↵
    Liu, X., Wu, C., Li, C., and Boerwinkle, E. (2016). dbNSFP v3.0: A One-Stop Database of Functional Predictions and Annotations for Human Nonsynonymous and Splice-Site SNVs. Hum Mutat 37, 235–241.
    OpenUrlCrossRefPubMed
  15. 15.
    Liu, X., Jian, X., and Boerwinkle, E. (2013). dbNSFP v2.0: a database of human non-synonymous SNVs and their functional predictions and annotations. Hum Mutat 34, E2393–2402.
    OpenUrlCrossRefPubMed
  16. 16.↵
    Liu, X., Jian, X., and Boerwinkle, E. (2011). dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum Mutat 32, 894–899.
    OpenUrlCrossRefPubMed
  17. 17.↵
    Wang, K., Li, M., and Hakonarson, H. (2010). ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 38, e164.
    OpenUrlCrossRefPubMed
  18. 18.↵
    Blekhman, R., Man, O., Herrmann, L., Boyko, A.R., Indap, A., Kosiol, C., Bustamante, C.D., Teshima, K.M., and Przeworski, M. (2008). Natural selection on genes that underlie human disease susceptibility. Curr Biol 18, 883–889.
    OpenUrlCrossRefPubMedWeb of Science
  19. 19.
    Berg, J.S., Adams, M., Nassar, N., Bizon, C., Lee, K., Schmitt, C.P., Wilhelmsen, K.C., and Evans, J.P. (2013). An informatics approach to analyzing the incidentalome. Genet Med 15, 36–44.
    OpenUrlCrossRefPubMed
  20. 20.
    Rahner, N., and Steinke, V. (2008). Hereditary cancer syndromes. Dtsch Arztebl Int 105, 706–714.
    OpenUrlPubMed
  21. 21.
    Cheng, D.T., Mitchell, T.N., Zehir, A., Shah, R.H., Benayed, R., Syed, A., Chandramohan, R., Liu, Z.Y., Won, H.H., Scott, S.N., et al. (2015). Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT): A Hybridization Capture-Based Next-Generation Sequencing Clinical Assay for Solid Tumor Molecular Oncology. J Mol Diagn 17, 251–264.
    OpenUrlCrossRefPubMed
  22. 22.↵
    Cheng, D.T., Prasad, M., Chekaluk, Y., Benayed, R., Sadowska, J., Zehir, A., Syed, A., Wang, Y.E., Somar, J., Li Y., et al. (2017). Comprehensive detection of germline variants by MSK-IMPACT, a clinical diagnostic platform for solid tumor molecular oncology and concurrent cancer predisposition testing. BMC Med Genomics 10, 33.
    OpenUrl
  23. 23.↵
    https://github.com/macarthur-lab/gene_lists
  24. 24.↵
    Thorstenson, Y.R., Shen, P., Tusher, V.G., Wayne, T.L., Davis, R.W., Chu, G., and Oefner, P.J. (2001). Global analysis of ATM polymorphism reveals significant functional constraint. Am J Hum Genet 69, 396–412.
    OpenUrlCrossRefPubMedWeb of Science
  25. 25.
    https://www.ncbi.nlm.nih.gov/clinvar/docs/details/
  26. 26.↵
    Janatova, M., Kleibl, Z., Stribrna, J., Panczak, A., Vesela, K., Zimovjanova, M., Kleiblova, P., Dundr, P., Soukupova, J., and Pohlreich, P. (2013). The PALB2 gene is a strong candidate for clinical testing in BRCA1- and BRCA2-negative hereditary breast cancer. Cancer Epidemiol Biomarkers Prev 22, 2323–2332.
    OpenUrlAbstract/FREE Full Text
  27. 27.↵
    Southey, M.C., Goldgar, D.E., Winqvist, R., Pylkas, K., Couch, F., Tischkowitz, M., Foulkes, W.D., Dennis, J., Michailidou, K., van Rensburg, E.J., et al. (2016). PALB2, CHEK2 and ATM rare variants and cancer risk: data from COGS. J Med Genet 53, 800–811.
    OpenUrlAbstract/FREE Full Text
  28. 28.↵
    Bouaoun, L., Sonkin, D., Ardin, M., Hollstein, M., Byrnes, G., Zavadil, J., and Olivier, M. (2016). TP53 Variations in Human Cancers: New Lessons from the IARC TP53 Database and Genomics Data. Hum Mutat 37, 865–876.
    OpenUrlCrossRefPubMed
  29. 29.↵
    Balmana, J., Digiovanni, L., Gaddam, P., Walsh, M.F., Joseph, V., Stadler, Z.K., Nathanson, K.L., Garber, J.E., Couch, F.J., Offit, K., et al. (2016). Conflicting Interpretation of Genetic Variants and Cancer Risk by Commercial Laboratories as Assessed by the Prospective Registry of Multiplex Testing. J Clin Oncol 34, 4071–4078.
    OpenUrl
  30. 30.↵
    Apweiler, R., Bairoch, A., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., et al. (2004). UniProt: the Universal Protein knowledgebase. Nucleic Acids Res 32, D115–119.
    OpenUrlCrossRefPubMedWeb of Science
  31. 31.↵
    Speir, M.L., Zweig, A.S., Rosenbloom, K.R., Raney, B.J., Paten, B., Nejad, P., Lee, B.T., Learned, K., Karolchik, D., Hinrichs, A.S., et al. (2016). The UCSC Genome Browser database: 2016 update. Nucleic Acids Res 44, D717–725.
    OpenUrlCrossRefPubMed
  32. 32.
    Tarailo-Graovac, M., and Chen, N. (2009). Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics Chapter 4, Unit 4 10.
    OpenUrl
  33. 33.
    Casper, J., Zweig, A.S., Villarreal, C., Tyner, C., Speir, M.L., Rosenbloom, K.R., Raney, B.J., Lee, C.M., Lee, B.T., Karolchik D., et al. (2017). The UCSC Genome Browser database: 2018 update. Nucleic Acids Res.
  34. 34.↵
    Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., and Haussler, D. (2002). The human genome browser at UCSC. Genome Res 12, 996–1006.
    OpenUrlAbstract/FREE Full Text
  35. 35.↵
    Dong, C., Wei, P., Jian, X., Gibbs, R., Boerwinkle, E., Wang, K., and Liu, X. (2015). Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. Hum Mol Genet 24, 2125–2137.
    OpenUrlCrossRefPubMed
  36. 36.↵
    Kircher, M., Witten, D.M., Jain, P., O’Roak, B.J., Cooper, G.M., and Shendure, J. (2014). A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 46, 310–315.
    OpenUrlCrossRefPubMed
  37. 37.↵
    Shihab, H.A., Gough, J., Cooper, D.N., Day, I.N., and Gaunt, T.R. (2013). Predicting the functional consequences of cancer-associated amino acid substitutions. Bioinformatics 29, 1504–1510.
    OpenUrlCrossRefPubMedWeb of Science
  38. 38.↵
    Chun, S., and Fay, J.C. (2009). Identification of deleterious mutations within three human genomes. Genome Res 19, 1553–1561.
    OpenUrlAbstract/FREE Full Text
  39. 39.↵
    Reva, B., Antipin, Y., and Sander, C. (2011). Predicting the functional impact of protein mutations: application to cancer genomics. Nucleic Acids Res 39, e118.
    OpenUrlCrossRefPubMedWeb of Science
  40. 40.↵
    Schwarz, J.M., Rodelsperger, C., Schuelke, M., and Seelow, D. (2010). MutationTaster evaluates disease-causing potential of sequence alterations. Nat Methods 7, 575–576.
    OpenUrlCrossRefPubMedWeb of Science
  41. 41.↵
    Choi, Y., and Chan, A.P. (2015). PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics 31, 2745–2747.
    OpenUrlCrossRefPubMed
  42. 42.↵
    Adzhubei, I.A., Schmidt, S., Peshkin, L., Ramensky, V.E., Gerasimova, A., Bork, P., Kondrashov, A.S., and Sunyaev, S.R. (2010). A method and server for predicting damaging missense mutations. Nat Methods 7, 248–249.
    OpenUrlCrossRefPubMedWeb of Science
  43. 43.↵
    Kumar, P., Henikoff, S., and Ng, P.C. (2009). Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 4, 1073–1081.
    OpenUrlCrossRefPubMedWeb of Science
  44. 44.↵
    Carter, H., Douville, C., Stenson, P.D., Cooper, D.N., and Karchin, R. (2013). Identifying Mendelian disease genes with the variant effect scoring tool. BMC Genomics 14 Suppl 3, S3.
    OpenUrlCrossRefPubMed
  45. 45.↵
    Jagadeesh, K.A., Wenger, A.M., Berger, M.J., Guturu, H., Stenson, P.D., Cooper, D.N., Bernstein, J.A., and Bejerano, G. (2016). M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity. Nat Genet 48, 1581–1586.
    OpenUrlCrossRef
  46. 46.↵
    Ghosh, R., Oak, N., and Plon, S.E. (2017). Evaluation of in silico algorithms for use with ACMG/AMP clinical variant interpretation guidelines. Genome Biol 18, 225.
    OpenUrlCrossRef
  47. 47.↵
    Jian, X., Boerwinkle, E., and Liu, X. (2014). In silico prediction of splice-altering single nucleotide variants in the human genome. Nucleic Acids Res 42, 13534–13544.
    OpenUrlCrossRefPubMed
  48. 48.
    Jian, X., and Liu, X. (2017). In Silico Prediction of Deleteriousness for Nonsynonymous and Splice-Altering Single Nucleotide Variants in the Human Genome. Methods Mol Biol 1498, 191–197.
    OpenUrl
  49. 49.↵
    Waite, K.A., and Eng, C. (2002). Protean PTEN: form and function. Am J Hum Genet 70, 829–844.
    OpenUrlCrossRefPubMedWeb of Science
  50. 50.↵
    Zerdoumi, Y., Aury-Landas, J., Bonaiti-Pellie, C., Derambure C., Sesboue, R., Renaux-Petel, M., Frebourg, T., Bougeard, G., and Flaman, J.M. (2013). Drastic effect of germline TP53 missense mutations in Li-Fraumeni patients. Hum Mutat 34, 453–461.
    OpenUrlCrossRefPubMed
  51. 51.↵
    Patel, R.Y., Shah, N., Jackson, A.R., Ghosh, R.,Pawliczek, P., Paithankar, S.,Baker, A., Riehle, K., Chen, H.,Milosavljevic, S., et al. (2017). ClinGen Pathogenicity Calculator: a configurable system for assessing pathogenicity of genetic variants. Genome Med 9, 3.
    OpenUrlCrossRefPubMed
  52. 52.↵
    Rehm, H.L., Berg, J.S., Brooks, L.D., Bustamante, C.D., Evans, J.P., Landrum, M.J., Ledbetter, D.H., Maglott, D.R., Martin, C.L., Nussbaum, R.L., et al. (2015). ClinGen--the Clinical Genome Resource. N Engl J Med 372, 2235–2242.
    OpenUrlCrossRefPubMed
  53. 53.↵
    Zhang, X., Minikel, E.V., O’Donnell-Luria, A.H., MacArthur, D.G., Ware, J.S., and Weisburd, B. (2017). ClinVar data parsing. Wellcome Open Res 2, 33.
    OpenUrlCrossRef
  54. 54.↵
    https://www.mskcc.org/blog/new-tumor-sequencing-test-will-bring-personalized-treatment-options-more-patients
  55. 55.↵
    Amendola, L.M., Jarvik, G.P., Leo, M.C., McLaughlin, H.M., Akkari, Y., Amaral, M.D., Berg, J.S., Biswas, S., Bowling, K.M., Conlin, L.K., et al. (2016). Performance of ACMG-AMP Variant-Interpretation Guidelines among Nine Laboratories in the Clinical Sequencing Exploratory Research Consortium. Am J Hum Genet 99, 247.
    OpenUrlCrossRefPubMed
  56. 56.↵
    Harrison, S.M., Dolinsky, J.S., Knight Johnson, A.E., Pesaran, T., Azzariti, D.R., Bale, S., Chao, E.C., Das, S., Vincent, L., and Rehm, H.L. (2017). Clinical laboratories collaborate to resolve differences in variant interpretations submitted to ClinVar. Genet Med 19, 1096–1104.
    OpenUrl
  57. 57.↵
    Breast Cancer Association, C. (2006). Commonly studied single-nucleotide polymorphisms and breast cancer: results from the Breast Cancer Association Consortium. J Natl Cancer Inst 98, 1382–1396.
    OpenUrlCrossRefPubMedWeb of Science
  58. 58.
    Hart, S.N., Maxwell, K.N., Thomas, T., Ravichandran, V., Wubberhorst, B., Klein, R.J., Schrader, K., Szabo, C., Weitzel, J.N., Neuhausen, S.L., et al. (2016). Collaborative science in the next-generation sequencing era: a viewpoint on how to combine exome sequencing data across sites to identify novel disease susceptibility genes. Brief Bioinform 17, 672–677.
    OpenUrlCrossRefPubMed
  59. 59.↵
    Complexo, Southey, M.C., Park, D.J., Nguyen-Dumont, T., Campbell, I., Thompson, E., Trainer, A.H., Chenevix-Trench, G., Simard, J., Dumont, M., et al. (2013). COMPLEXO: identifying the missing heritability of breast cancer via next generation collaboration. Breast Cancer Res 15, 402.
    OpenUrlPubMed
  60. 60.↵
    Chenevix-Trench, G., Milne, R.L., Antoniou, A.C., Couch, F.J., Easton, D.F., Goldgar, D.E., and Cimba. (2007). An international initiative to identify genetic modifiers of cancer risk in BRCA1 and BRCA2 mutation carriers: the Consortium of Investigators of Modifiers of BRCA1 and BRCA2 (CIMBA). Breast Cancer Res 9, 104.
    OpenUrlCrossRefPubMed
  61. 62.↵
    Stenson, P.D., Mort, M., Ball, E.V., Evans, K., Hayden, M., Heywood, S., Hussain, M., Phillips, A.D., and Cooper, D.N. (2017). The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Hum Genet 136, 665–677.
    OpenUrlCrossRefPubMed
  62. 63.↵
    Cooper, D.N., and Krawczak, M. (1996). Human Gene Mutation Database. Hum Genet 98, 629.
    OpenUrlCrossRefPubMedWeb of Science
  63. 64.↵
    Whiffin, N., Walsh, R., Govind, R., Edwards, M., Ahmad, M., Zhang, X., Tayal, U., Buchan, R., Midwinter, W., Wilk, A., et al. (2017). CardioClassifier: demonstrating the power of disease-and gene-specific computational decision support for clinical genome interpretation. bioRxiv.
  64. 65.↵
    Li, Q., and Wang, K. (2017). InterVar: Clinical Interpretation of Genetic Variants by the 2015 ACMG-AMP Guidelines. Am J Hum Genet 100, 267–280.
    OpenUrl
  65. 66.↵
    Kelly, M.A., Caleshu, C., Morales, A., Buchan, J., Wolf, Z., Harrison, S.M., Cook, S., Dillon, M.W., Garcia, J., Haverfield, E., et al. (2018). Adaptation and validation of the ACMG/AMP variant classification framework for MYH7-associated inherited cardiomyopathies: recommendations by ClinGen’s Inherited Cardiomyopathy Expert Panel. Genet Med.
  66. 67.↵
    Jarvik, G.P., and Browning, B.L. (2016). Consideration of Cosegregation in the Pathogenicity Classification of Genomic Variants. Am J Hum Genet 98, 1077–1081.
    OpenUrlPubMed
  67. 68.
    Nykamp, K., Anderson, M., Powers, M., Garcia, J., Herrera, B., Ho, Y.Y., Kobayashi, Y., Patil, N., Thusberg, J., Westbrook, M., et al. (2017). Sherloc: a comprehensive refinement of the ACMGAMP variant classification criteria. Genet Med 19, 1105–1117.
    OpenUrlCrossRef
  68. 69.
    Knudson, A.G., Jr.. (1971). Mutation and cancer: statistical study of retinoblastoma. Proc Natl Acad Sci U S A 68, 820–823.
    OpenUrlAbstract/FREE Full Text
  69. 70.↵
    Olsen, J.H., Hahnemann, J.M., Borresen-Dale, A.L., Brondum-Nielsen, K., Hammarstrom, L., Kleinerman, R., Kaariainen, H., Lonnqvist, T., Sankila, R., Seersholm, N., et al. (2001). Cancer in patients with ataxia-telangiectasia and in their relatives in the nordic countries. J Natl Cancer Inst 93, 121–127.
    OpenUrlCrossRefPubMedWeb of Science
  70. 71.↵
    http://oncotree.mskcc.org/oncotree/
  71. 72.↵
    Vijai, J., Topka, S., Villano, D., Ravichandran, V., Maxwell, K.N., Maria, A., Thomas, T., Gaddam, P., Lincoln, A., Kazzaz, S., et al. (2016). A Recurrent ERCC3 Truncating Mutation Confers Moderate Risk for Breast Cancer. Cancer Discov 6, 1267–1275.
    OpenUrlAbstract/FREE Full Text
  72. 73.↵
    Guidugli, L., Carreira, A., Caputo, S.M., Ehlen, A., Galli, A., Monteiro, A.N., Neuhausen, S.L., Hansen, T.V., Couch, F.J., Vreeswijk, M.P., et al. (2014). Functional assays for analysis of variants of uncertain significance in BRCA2. Hum Mutat 35, 151–164.
    OpenUrlCrossRefPubMed
  73. 74.↵
    Spurdle, A.B., Healey, S., Devereau, A., Hogervorst, F.B., Monteiro, A.N., Nathanson, K.L., Radice, P., Stoppa-Lyonnet, D., Tavtigian, S., Wappenschmidt, B., et al. (2012). ENIGMA--evidence-based network for the interpretation of germline mutant alleles: an international initiative to evaluate risk and clinical significance associated with sequence variation in BRCA1 and BRCA2 genes. Hum Mutat 33, 2–7.
    OpenUrlCrossRefPubMed
  74. 75.↵
    https://www.ga4gh.org/
  75. 76.↵
    Beroud, C., Letovsky, S.I., Braastad, C.D., Caputo, S.M., Beaudoux, O., Bignon, Y.J., Bressac-De Paillerets, B., Bronner, M., Buell, C.M., Collod-Beroud, G., et al. (2016). BRCA Share: A Collection of Clinical BRCA Gene Variants. Hum Mutat 37, 1318–1328.
    OpenUrlCrossRefPubMed
  76. 77.↵
    Findlay, G.M., Boyle, E.A., Hause, R.J., Klein, J.C., and Shendure, J. (2014). Saturation editing of genomic regions by multiplex homology-directed repair. Nature 513, 120–123.
    OpenUrlCrossRefPubMedWeb of Science
  77. 78.
    Starita, L.M., Young, D.L., Islam, M., Kitzman, J.O., Gullingsrud, J., Hause, R.J., Fowler, D.M., Parvin, J.D., Shendure, J., and Fields, S. (2015). Massively Parallel Functional Analysis of BRCA1 RING Domain Variants. Genetics 200, 413–422.
    OpenUrlAbstract/FREE Full Text
  78. 79.↵
    Starita, L.M., Ahituv, N., Dunham, M.J., Kitzman, J.O., Roth, F.P., Seelig, G., Shendure, J., and Fowler, D.M. (2017). Variant Interpretation: Functional Assays to the Rescue. Am J Hum Genet 101, 315–325.
    OpenUrlCrossRefPubMed
  79. 80.↵
    Starita, L.M., Islam, M.M., Banerjee, T., Adamovich, A.I., Gullingsrud, J., Fields, S., Shendure, J., Parvin J.D. (2018) A multiplexed homology-directed DNA repair assay reveals the impact of ~1,700 BRCA1 variants on protein function. bioRxiv
View Abstract
Back to top
PreviousNext
Posted May 01, 2018.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Towards automation of germline variant curation in clinical cancer genetics
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
Share
Towards automation of germline variant curation in clinical cancer genetics
Vignesh Ravichandran, Zarina Shameer, Yelena Kernel, Michael Walsh, Karen Cadoo, Steven Lipkin, Diana Mandelker, Liying Zhang, Zsofia Stadler, Mark Robson, Kenneth Offit, Joseph Vijai
bioRxiv 295865; doi: https://doi.org/10.1101/295865
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
Towards automation of germline variant curation in clinical cancer genetics
Vignesh Ravichandran, Zarina Shameer, Yelena Kernel, Michael Walsh, Karen Cadoo, Steven Lipkin, Diana Mandelker, Liying Zhang, Zsofia Stadler, Mark Robson, Kenneth Offit, Joseph Vijai
bioRxiv 295865; doi: https://doi.org/10.1101/295865

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetics
Subject Areas
All Articles
  • Animal Behavior and Cognition (1519)
  • Biochemistry (2473)
  • Bioengineering (1727)
  • Bioinformatics (9648)
  • Biophysics (3884)
  • Cancer Biology (2961)
  • Cell Biology (4173)
  • Clinical Trials (135)
  • Developmental Biology (2620)
  • Ecology (4084)
  • Epidemiology (2031)
  • Evolutionary Biology (6868)
  • Genetics (5195)
  • Genomics (6482)
  • Immunology (2176)
  • Microbiology (6909)
  • Molecular Biology (2746)
  • Neuroscience (17197)
  • Paleontology (125)
  • Pathology (425)
  • Pharmacology and Toxicology (703)
  • Physiology (1050)
  • Plant Biology (2478)
  • Scientific Communication and Education (642)
  • Synthetic Biology (828)
  • Systems Biology (2680)
  • Zoology (429)