pRESTO: a toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires

Bioinformatics. 2014 Jul 1;30(13):1930-2. doi: 10.1093/bioinformatics/btu138. Epub 2014 Mar 10.

Abstract

Driven by dramatic technological improvements, large-scale characterization of lymphocyte receptor repertoires via high-throughput sequencing is now feasible. Although promising, the high germline and somatic diversity, especially of B-cell immunoglobulin repertoires, presents challenges for analysis requiring the development of specialized computational pipelines. We developed the REpertoire Sequencing TOolkit (pRESTO) for processing reads from high-throughput lymphocyte receptor studies. pRESTO processes raw sequences to produce error-corrected, sorted and annotated sequence sets, along with a wealth of metrics at each step. The toolkit supports multiplexed primer pools, single- or paired-end reads and emerging technologies that use single-molecule identifiers. pRESTO has been tested on data generated from Roche and Illumina platforms. It has a built-in capacity to parallelize the work between available processors and is able to efficiently process millions of sequences generated by typical high-throughput projects.

Availability and implementation: pRESTO is freely available for academic use. The software package and detailed tutorials may be downloaded from http://clip.med.yale.edu/presto.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • High-Throughput Nucleotide Sequencing / methods*
  • Lymphocytes / immunology*
  • Receptors, Immunologic / chemistry*
  • Receptors, Immunologic / immunology
  • Sequence Analysis, DNA
  • Sequence Analysis, RNA
  • Software

Substances

  • Receptors, Immunologic