Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

A method for achieving complete microbial genomes and improving bins from metagenomics data

View ORCID ProfileLauren M. Lui, View ORCID ProfileTorben N. Nielsen, View ORCID ProfileAdam P. Arkin
doi: https://doi.org/10.1101/2020.03.05.979740
Lauren M. Lui
1Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Lauren M. Lui
Torben N. Nielsen
1Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Torben N. Nielsen
Adam P. Arkin
1Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
2Department of Bioengineering, University of California, Berkeley, CA, USA
3Innovative Genomics Institute, Berkeley, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Adam P. Arkin
  • For correspondence: aparkin@lbl.gov
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Metagenomics facilitates the study of the genetic information from uncultured microbes and complex microbial communities. Assembling complete microbial genomes (i.e., circular with no misassemblies) from metagenomics data is difficult because most samples have high organismal complexity and strain diversity. Less than 100 circularized bacterial and archaeal genomes have been assembled from metagenomics data despite the thousands of datasets that are available. Circularized genomes are important for (1) building a reference collection as scaffolds for future assemblies, (2) providing complete gene content of a genome, (3) confirming little or no contamination of a genome, (4) studying the genomic context and synteny of genes, and (5) linking protein coding genes to ribosomal RNA genes to aid metabolic inference in 16S rRNA gene sequencing studies. We developed a method to achieve circularized genomes using iterative assembly, binning, and read mapping. In addition, this method exposes potential misassemblies from k-mer based assemblies. We chose species of the Candidate Phyla Radiation (CPR) to focus our initial efforts because they have small genomes and are only known to have one ribosomal RNA operon. We present 34 circular CPR genomes, one circular Margulisbacteria genome, and two circular megaphage genomes from 19 public and published datasets. We demonstrate findings that would likely be difficult without circularizing genomes, including that ribosomal genes are likely not operonic in the majority of CPR, and that some CPR harbor diverged forms of RNase P RNA. Code and a tutorial for this method is available at https://github.com/lmlui/Jorg.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • Added in more references. Added in link to code on Github.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
Back to top
PreviousNext
Posted July 18, 2020.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
A method for achieving complete microbial genomes and improving bins from metagenomics data
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
A method for achieving complete microbial genomes and improving bins from metagenomics data
Lauren M. Lui, Torben N. Nielsen, Adam P. Arkin
bioRxiv 2020.03.05.979740; doi: https://doi.org/10.1101/2020.03.05.979740
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
A method for achieving complete microbial genomes and improving bins from metagenomics data
Lauren M. Lui, Torben N. Nielsen, Adam P. Arkin
bioRxiv 2020.03.05.979740; doi: https://doi.org/10.1101/2020.03.05.979740

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Microbiology
  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4095)
  • Biochemistry (8788)
  • Bioengineering (6493)
  • Bioinformatics (23396)
  • Biophysics (11766)
  • Cancer Biology (9171)
  • Cell Biology (13292)
  • Clinical Trials (138)
  • Developmental Biology (7423)
  • Ecology (11389)
  • Epidemiology (2066)
  • Evolutionary Biology (15121)
  • Genetics (10415)
  • Genomics (14026)
  • Immunology (9152)
  • Microbiology (22111)
  • Molecular Biology (8793)
  • Neuroscience (47460)
  • Paleontology (350)
  • Pathology (1423)
  • Pharmacology and Toxicology (2486)
  • Physiology (3712)
  • Plant Biology (8069)
  • Scientific Communication and Education (1433)
  • Synthetic Biology (2216)
  • Systems Biology (6022)
  • Zoology (1251)