Abstract
Mutations that cause genetic diseases can be difficult to identify if the mutation does not affect the sequence of the protein, but the splice form of the transcript. However, the prediction of deleterious changes caused by genomic variants that affect splicing has been shown to be accurate using information theory-based methods. We made several such predictions of potential splicing changes that could be caused by SNPs which were found to cause natural and/or cryptic splice site strength changes. We evaluated a selected set of 22 SNPs that we predicted by information analysis to affect splicing, validated these with targeted expression analysis, and compared the results with genome-scale interpretation of RNAseq data from tumors. Abundance of natural and predicted splice isoforms were quantified by q-RT-PCR and with probeset intensities from exon microarrays using RNA isolated from HapMap lymphoblastoid cell lines containing the predicted deleterious variants. These SNPs reside within the following genes: XRCC4, IL19, C2lorf2, UBASH3A, TTC3, PRAME, EMID1, ARFGAP3, GUSBP11 (Fλ8), WBP2NL, LPP, IFI44L, CFLAR, FAM3B, CYB5R3, COL6A2, BCR, BACE2, CLDN14, TMPRSS3 and DERL3. 15 of these SNPs showed a significant change in the use of the affected splice site. Individuals homozygous for the stronger allele had higher transcription of the associated gene than individuals with the weaker allele in 3 of these SNPs. 13 SNPs had a direct effect on exon inclusion, while 10 altered cryptic site use. In 4 genes, individuals of the same genotype had high expression variability caused by alternate factors which masked potential effects of the SNP. Targeted expression analyses for 8 SNPs in this study were confirmed by results of genome-wide information theory and expression analyses.