RT Journal Article SR Electronic T1 Mutation saturation for fitness effects at human CpG sites JF bioRxiv FD Cold Spring Harbor Laboratory SP 2021.06.02.446661 DO 10.1101/2021.06.02.446661 A1 Ipsita Agarwal A1 Molly Przeworski YR 2021 UL http://biorxiv.org/content/early/2021/06/02/2021.06.02.446661.abstract AB Whole exome sequences have now been collected for millions of humans, with the related goals of identifying pathogenic mutations in patients and establishing reference repositories of data from unaffected individuals. As a result, we are approaching an important limit, in which datasets are large enough that, in the absence of natural selection, every highly mutable site will have experienced at least one mutation in the genealogical history of the sample. Here, we focus on putatively-neutral, synonymous CpG sites that are methylated in the germline and experience mutations to T at an elevated rate of ~10-7 per site per generation; in a sample of 390,000 individuals, ~99% of such CpG sites harbor a C/T polymorphism. These CpG sites provide a natural mutation saturation experiment for fitness effects: as we show, at current sample sizes, not seeing a polymorphism is indicative of strong selection against that mutation. We rely on this idea in order to directly identify a subset of highly deleterious CpG transitions, including ~27% of possible loss-of-function mutations, and up to 21% of possible missense mutations, depending on the type of site in which they occur. Unlike methylated CpGs, most mutation types, with rates on the order of 10-8 or 10-9, remain very far from saturation. We discuss what this contrast implies about interpreting the potential clinical relevance of mutations from their presence or absence in reference databases and for inferences about the fitness effects of new mutations.Competing Interest StatementThe authors have declared no competing interest.