Abstract
Non-B DNA structures formed by repetitive sequence motifs are known instigators of mutagenesis in experimental systems. Analyzing this phenomenon computationally in the human genome requires careful disentangling of intrinsic confounding factors, including overlapping and interrupted motifs, and recurrent sequencing errors. Accounting for these factors eliminates all signals of repeat-induced mutagenesis that extend beyond the motif boundary, and eliminates or dramatically shrinks the magnitude of mutagenesis within some motifs, contradicting previous reports. Mutagenesis not attributable to artifacts revealed several biological mechanisms. Polymerase slippage generates frequent indels within every variety of short tandem repeat motif, implicating slipped-strand structures. Interruption-correcting SNVs within STRs distinctly implicate error-prone Polκ. Secondary-structure formation promotes SNVs within palindromic repeats, as well as duplications within direct repeats. G-quadruplex motifs cause recurrent sequencing errors, while mutagenesis at Z-DNAs is conspicuously absent.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Extensive revisions, largely driven by a new analysis of indels.