TY - JOUR T1 - Validated Bayesian differentiation of causative and passenger mutations JF - bioRxiv DO - 10.1101/097931 SP - 097931 AU - Frederick R. Cross AU - Michal Breker AU - Kristi Lieberman Y1 - 2017/01/01 UR - http://biorxiv.org/content/early/2017/01/03/097931.abstract N2 - In many contexts, the problem arises of determining which of many candidate mutations is the most likely to be causative for some phenotype. It is desirable to have a way to evaluate this probability that relies as little as possible on previous knowledge, to avoid bias against discovering new genes or functions. We are isolating mutants with blocked cell cycle progression in Chlamydomonas, and determining mutant genome sequences. Due to the intensity of UV mutagenesis required for efficient mutant collection, the mutants contain multiple mutations altering coding sequence. To provide a quantitative estimate of probability that each individual mutation in a given mutant is the causative one, we develop a Bayesian approach. The approach employs four independent indicators: sequence conservation of the mutated coding sequence with Arabidopsis; severity of the mutation relative to Chlamydomonas wild type based on Blosum62 scores; meiotic mapping information for location of the causative mutation relative to known molecular markers; and, for a subset of mutants, transcriptional profile of the candidate wild type genes through the mitotic cell cycle.These indicators are statistically independent, and so can be combined quantitatively into a single probability calculation. We validate this calculation: recently isolated mutations that were not in the training set for developing the indicators, with high calculated probability of causality, are confirmed in every case by additional genetic data to indeed be causative. Analysis of best reciprocal blast relationships among Chlamydomonas and other eukaryotes indicate that the Ts-lethal mutants that our procedure recovers are highly enriched for fundamental cell-essential functions conserved broadly across plants and other eukaryotes, accounting for the high information content of sequence alignment to Arabidopsis. ER -