TY - JOUR T1 - The contribution of alternative splicing probability to the coding expansion of the genome JF - bioRxiv DO - 10.1101/048124 SP - 048124 AU - Fernando Carrillo Oesterreich AU - Hugo Bowne-Anderson AU - Jonathon Howard Y1 - 2016/01/01 UR - http://biorxiv.org/content/early/2016/04/11/048124.abstract N2 - Alternative splicing results in the inclusion or exclusion of exons in an RNA, thereby allowing a single gene to code for multiple RNA isoforms. Genes are often composed of many exons, allowing combinatorial choice to significantly expand the coding potential of the genome. How much coding potential is gained by alternative splicing and what is the main contributor: alternative-splicing-depth or exon-count? Here we develop a splice-site-centric quantification method, allowing us to characterize transcriptome-wide alternative splicing with a simple probabilistic model, enabling species-wide comparison. We use information theory to quantify the coding potential gain and show that an increase in alternative splicing probability contributes more to transcriptome expansion than exon-count. Our results suggest that dominant isoforms are co-expressed alongside many minor isoforms. We propose that this solves two problems simultaneously, that is, expression of functional isoforms and expansion of the transcriptome landscape potentially without a direct function, but available for evolution.TranscriptomeSet of all RNA molecules in a sample (e.g. cell, tissue, organism).Transcriptome expansionIncrease of coding expansion of the genome.Gene annotationMeta information added to the raw DNA sequence, such as exon-intron structure.Gene architectureExon-intron structure of genes.RNA SplicingRNA maturation event leading to removal of introns and joining of exons.IntronSequence removed by splicing, often non-coding for proteins.ExonSequence retained by splicing, often coding.Splice siteExon-intron (5’ splice site) or intron-exon boundary (3’ splice site).Constitutive splicingThe process that results in the joining of two splice-sites in all observed situation.Alternative splicingThe process that results that one splice site can be joined to distinct partner splice sites.RNA-seq experimentQualitative and quantitative profile of transcriptome by deep sequencing.ExtentA parameter used to characterize the amount of alternative splicing in any given transcriptome; technically, the extent , where α is the exponent in the power law distribution that describes the amount of alternative splicing in the transcriptome.Splice site expressionNumber of RNA-Seq observations per splice site.Shannon EntropyMetric of the expected information content.True DiversityAn ecological concept which measures both the number of distinct species (richness) and how uniformly they are distributed in a sample (evenness).Machine LearningComputational algorithms which learn rules (model) to predict an output from an input.Random ForestA non-linear machine learning model based on an ensemble of decision trees with random feature subset selection at each decision node.Lasso RegressionLinear regression regularized by absolute value of the sum of all regression coefficients (L1 norm).BootstrappingResampling technique to infer the confidence in a population measurement.Probability density function (pdf)A function of a random variable X that describes the relative frequency for X to take each of its specific values.Kernel density estimationA method of estimating the probability distribution function based on a finite sample of data.Bayesian InferenceA method of statistical reference in which prior knowledge is recursively updated utilizing new data using Bayes’ Theorem in order to make statements about probablistic hypotheses.Prior distributionThe distribution (pdf) that mathematically formalizes one’s belief about the state of the system before taking empirical evidence (data) into account (note that the distribution can be a mathematical formalization of being in a state of ignorance).Posterior distributionThe distribution that describes the probability of the random variable in question after the evidence/data is taken into account. ER -