Long-read sequencing technology indicates genome-wide effects of non-B DNA on polymerization speed and error rate

  1. Kateryna D. Makova3
  1. 1Bioinformatics and Genomics Graduate Program, Penn State University, University Park, Pennsylvania 16802, USA;
  2. 2Department of Statistics, Penn State University, University Park, Pennsylvania 16802, USA;
  3. 3Department of Biology, Penn State University, University Park, Pennsylvania 16802, USA;
  4. 4Department of Biophysics of Nucleic Acids, Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 612 65 Brno, Czech Republic;
  5. 5Department of Plant Developmental Genetics, Institute of Biophysics of the Czech Academy of Sciences, Královopolská 135, 612 65 Brno, Czech Republic;
  6. 6Department of Pathology, Penn State University, College of Medicine, Hershey, Pennsylvania 17033, USA;
  7. 7Sant'Anna School of Advanced Studies, 56127 Pisa, Italy
  1. 8 These authors contributed equally to this work.

  • Corresponding authors: kdm16{at}psu.edu, chiaro{at}stat.psu.edu
  • Abstract

    DNA conformation may deviate from the classical B-form in ∼13% of the human genome. Non-B DNA regulates many cellular processes; however, its effects on DNA polymerization speed and accuracy have not been investigated genome-wide. Such an inquiry is critical for understanding neurological diseases and cancer genome instability. Here, we present the first simultaneous examination of DNA polymerization kinetics and errors in the human genome sequenced with Single-Molecule Real-Time (SMRT) technology. We show that polymerization speed differs between non-B and B-DNA: It decelerates at G-quadruplexes and fluctuates periodically at disease-causing tandem repeats. Analyzing polymerization kinetics profiles, we predict and validate experimentally non-B DNA formation for a novel motif. We demonstrate that several non-B motifs affect sequencing errors (e.g., G-quadruplexes increase error rates), and that sequencing errors are positively associated with polymerase slowdown. Finally, we show that highly divergent G4 motifs have pronounced polymerization slowdown and high sequencing error rates, suggesting similar mechanisms for sequencing errors and germline mutations.

    Footnotes

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.241257.118.

    • Freely available online through the Genome Research Open Access option.

    • Received June 29, 2018.
    • Accepted October 30, 2018.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server