Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
Confirmatory Results

SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments

Andrew J. Page, Ben Taylor, Aidan J. Delaney, Jorge Soares, Torsten Seemann, Jacqueline A. Keane, Simon R. Harris
doi: https://doi.org/10.1101/038190
Andrew J. Page
Pathogen Genomics, Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK, CB10 1SA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: andrew.page@sanger.ac.uk
Ben Taylor
Pathogen Genomics, Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK, CB10 1SA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Aidan J. Delaney
Computing, Engineering and Mathematics, University of Brighton, Moulsecoomb, Brighton, UK, BN2 4GJ.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jorge Soares
Pathogen Genomics, Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK, CB10 1SA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Torsten Seemann
Victorian Life Sciences Computation Initiative, The University of Melbourne, Parkville, Australia.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jacqueline A. Keane
Pathogen Genomics, Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK, CB10 1SA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Simon R. Harris
Pathogen Genomics, Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK, CB10 1SA.
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

ABSTRACT

Rapidly decreasing genome sequencing costs have led to a proportionate increase in the number of samples used in prokaryotic population studies. Extracting single nucleotide polymorphisms (SNPs) from a large whole genome alignment is now a routine task, but existing tools have failed to scale efficiently with the increased size of studies. These tools are slow, memory inefficient and are installed through non-standard procedures. We present SNP-sites which can rapidly extract SNPs from a multi-FASTA alignment using modest resources and can output results in multiple formats for downstream analysis. SNPs can be extracted from a 8.3 GB alignment file (1,842 taxa, 22,618 sites) in 267 seconds using 59 MB of RAM and 1 CPU core, making it feasible to run on modest computers. It is easy to install through the Debian and Homebrew package managers, and has been successfully tested on more than 20 operating systems. SNP-sites is implemented in C and is available under the open source license GNU GPL version 3.

  • ABBREVIATIONS

    SNP
    single nucleotide polymorphism
    VCF
    Variant Call Format
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
    Back to top
    PreviousNext
    Posted January 29, 2016.
    Download PDF
    Email

    Thank you for your interest in spreading the word about bioRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments
    (Your Name) has forwarded a page to you from bioRxiv
    (Your Name) thought you would like to see this page from the bioRxiv website.
    Share
    SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments
    Andrew J. Page, Ben Taylor, Aidan J. Delaney, Jorge Soares, Torsten Seemann, Jacqueline A. Keane, Simon R. Harris
    bioRxiv 038190; doi: https://doi.org/10.1101/038190
    Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
    Citation Tools
    SNP-sites: rapid efficient extraction of SNPs from multi-FASTA alignments
    Andrew J. Page, Ben Taylor, Aidan J. Delaney, Jorge Soares, Torsten Seemann, Jacqueline A. Keane, Simon R. Harris
    bioRxiv 038190; doi: https://doi.org/10.1101/038190

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Bioinformatics
    Subject Areas
    All Articles
    • Animal Behavior and Cognition (1544)
    • Biochemistry (2500)
    • Bioengineering (1757)
    • Bioinformatics (9727)
    • Biophysics (3928)
    • Cancer Biology (2990)
    • Cell Biology (4235)
    • Clinical Trials (135)
    • Developmental Biology (2653)
    • Ecology (4129)
    • Epidemiology (2033)
    • Evolutionary Biology (6931)
    • Genetics (5243)
    • Genomics (6531)
    • Immunology (2207)
    • Microbiology (7012)
    • Molecular Biology (2782)
    • Neuroscience (17410)
    • Paleontology (127)
    • Pathology (432)
    • Pharmacology and Toxicology (712)
    • Physiology (1068)
    • Plant Biology (2515)
    • Scientific Communication and Education (647)
    • Synthetic Biology (835)
    • Systems Biology (2698)
    • Zoology (439)