Abstract
Predicting which cryptic-donors may be activated by a genetic variant is notoriously difficult. Through analysis of 5,145 cryptic-donors activated by 4,811 variants (versus 86,963 decoy-donors not used; any GT or GC), we define an empirical method predicting cryptic-donor activation with 87% sensitivity and 95% specificity. Strength (according to four algorithms) and proximity to the authentic-donor appear important determinants of cryptic-donor activation. However, other factors such as auxiliary splicing elements, which are difficult to identify, play an important role and are likely responsible for current prediction inaccuracies. We find that the most frequent mis-splicing events at each exon-intron junction, mined from 40,233 RNA-sequencing samples, predict with remarkable accuracy which cryptic-donor will be activated in rare disease. Aggregate RNA-Sequencing splice-junction data provides an accurate, evidence-based method to predict variant-activated cryptic-donors in genetic disorders, assisting pathology consideration of possible consequences of a variant for the encoded protein and RNA diagnostic testing strategies.
Competing Interest Statement
S.T.C. and H.J. are named inventors of Intellectual Property (IP) described in part within this manuscript owned jointly by the University of Sydney and Sydney Children's Hospitals Network. S.T.C. is director of Frontier Genomics Pty Ltd (Australia) who have licenced this IP. S.T.C. receives no payment or other financial incentives for services provided to Frontier Genomics Pty Ltd (Australia). Frontier Genomics Pty Ltd (Australia) has no existing financial relationships that will benefit from publication of these data. The remaining co-authors declare no conflicts of interest.