Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold

Nat Methods. 2019 Jul;16(7):603-606. doi: 10.1038/s41592-019-0437-4. Epub 2019 Jun 24.

Abstract

The open-source de novo protein-level assembler, Plass ( https://plass.mmseqs.com ), assembles six-frame-translated sequencing reads into protein sequences. It recovers 2-10 times more protein sequences from complex metagenomes and can assemble huge datasets. We assembled two redundancy-filtered reference protein catalogs, 2 billion sequences from 640 soil samples (soil reference protein catalog) and 292 million sequences from 775 marine eukaryotic metatranscriptomes (marine eukaryotic reference catalog), the largest free collections of protein sequences.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Codon
  • Metagenomics*
  • Open Reading Frames
  • Proteins / chemistry*

Substances

  • Codon
  • Proteins