Optimized Illumina PCR-free library preparation for bacterial whole genome sequencing and analysis of factors influencing de novo assembly

BMC Res Notes. 2016 May 12:9:269. doi: 10.1186/s13104-016-2072-9.

Abstract

Background: Next-generation sequencing (NGS) technology has paved the way for rapid and cost-efficient de novo sequencing of bacterial genomes. In particular, the introduction of PCR-free library preparation procedures (LPPs) lead to major improvements as PCR bias is largely reduced. However, in order to facilitate the assembly of Illumina paired-end sequence data and to enhance assembly performance, an increase of insert sizes to facilitate the repeat bridging and resolution capabilities of current state of the art assembly tools is needed. In addition, information concerning the relationships between genomic GC content, library insert size and sequencing quality as well as the influence of library insert size, read length and sequencing depth on assembly performance would be helpful to specifically target sequencing projects.

Results: Optimized DNA fragmentation settings and fine-tuned resuspension buffer to bead buffer ratios during fragment size selection were integrated in the Illumina TruSeq(®) DNA PCR-free LPP in order to produce sequencing libraries varying in average insert size for bacterial genomes within a range of 35.4-73.0 % GC content. The modified protocol consumes only half of the reagents per sample, thus doubling the number of preparations possible with a kit. Examination of different libraries revealed that sequencing quality decreases with increased genomic GC content and with larger insert sizes. The estimation of assembly performance using assembly metrics like corrected NG50 and NGA50 showed that libraries with larger insert sizes can result in substantial assembly improvements as long as appropriate assembly tools are chosen. However, such improvements seem to be limited to genomes with a low to medium GC content. A positive trend between read length and assembly performance was observed while sequencing depth is less important, provided a minimum coverage is reached.

Conclusions: Based on the optimized protocol developed, sequencing libraries with flexible insert sizes and lower reagent costs can be generated. Furthermore, increased knowledge about the interplay of sequencing quality, insert size, genomic GC content, read length, sequencing depth and the assembler used will help molecular biologists to set up an optimal experimental and analytical framework with respect to Illumina next-generation sequencing of bacterial genomes.

Keywords: Assembler; Assembly performance; Average library insert size; Genomic GC content; Illumina; Next generation sequencing; PCR-free sample preparation; Read depth; Read size; Sequencing quality.

Publication types

  • Validation Study

MeSH terms

  • DNA, Bacterial / genetics
  • Genome, Bacterial*
  • Polymerase Chain Reaction / methods*

Substances

  • DNA, Bacterial