Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

Easy and Accurate Reconstruction of Whole HIV Genomes from Short-Read Sequence Data

Chris Wymant, François Blanquart, Astrid Gall, Margreet Bakker, Daniela Bezemer, Nicholas J. Croucher, Tanya Golubchik, Matthew Hall, Mariska Hillebregt, Swee Hoe Ong, Jan Albert, Norbert Bannert, Jacques Fellay, Katrien Fransen, Annabelle Gourlay, M. Kate Grabowski, Barbara Gunsenheimer-Bartmeyer, Huldrych F. Günthard, Pia Kivelä, Roger Kouyos, Oliver Laeyendecker, Kirsi Liitsola, Laurence Meyer, Kholoud Porter, Matti Ristola, Ard van Sighem, Guido Vanham, Ben Berkhout, Marion Cornelissen, Paul Kellam, Peter Reiss, Christophe Fraser, The BEEHIVE Collaboration
doi: https://doi.org/10.1101/092916
Chris Wymant
1Oxford Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, UK
2Medical Research Council Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: wymant@well.ox.ac.uk
François Blanquart
2Medical Research Council Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Astrid Gall
3Virus Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Margreet Bakker
4Laboratory of Experimental Virology, Department of Medical Microbiology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center of the University of Amsterdam, Amsterdam, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Daniela Bezemer
5Stichting HIV Monitoring, Amsterdam, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nicholas J. Croucher
2Medical Research Council Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tanya Golubchik
1Oxford Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, UK
6Wellcome Trust Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Matthew Hall
1Oxford Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, UK
2Medical Research Council Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mariska Hillebregt
5Stichting HIV Monitoring, Amsterdam, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Swee Hoe Ong
3Virus Genomics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jan Albert
7Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
8Department of Clinical Microbiology, Karolinska University Hospital, Stockholm, Sweden
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Norbert Bannert
9Division for HIV and other Retroviruses, Robert Koch Institute, Berlin, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jacques Fellay
10School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Switzerland
11Swiss Institute of Bioinformatics, Lausanne, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Katrien Fransen
12HIV/STI reference laboratory, WHO collaborating centre, Institute of Tropical Medicine, Department of Clinical Science, Antwerpen, Belgium
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Annabelle Gourlay
13Department of Infection and Population Health, University College London, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
M. Kate Grabowski
14John Hopkins University, Baltimore, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Barbara Gunsenheimer-Bartmeyer
15Department of Infectious Disease Epidemiology, Robert Koch-Institute, Berlin, Germany
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Huldrych F. Günthard
16Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, Zurich, Switzerland
17Institute of Medical Virology, University of Zurich, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Pia Kivelä
18Department of infectious Diseases, Helsinki University Hospital, Helsinki, Finland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Roger Kouyos
16Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, Zurich, Switzerland
17Institute of Medical Virology, University of Zurich, Switzerland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Oliver Laeyendecker
19Laboratory of Immunoregulation, NIAID, NIH, Baltimore, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kirsi Liitsola
18Department of infectious Diseases, Helsinki University Hospital, Helsinki, Finland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Laurence Meyer
20INSERM CESP U1018, Université Paris Sud, Université Paris Saclay, APHP, Service de Santé Publique, Hôpital de Bicêtre, Le Kremlin-Bicêtre, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kholoud Porter
13Department of Infection and Population Health, University College London, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Matti Ristola
18Department of infectious Diseases, Helsinki University Hospital, Helsinki, Finland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ard van Sighem
5Stichting HIV Monitoring, Amsterdam, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Guido Vanham
21Virology Unit, Immunovirology Research Pole, Biomedical Sciences Department, Institute of Tropical Medicine, Antwerpen, Belgium
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ben Berkhout
4Laboratory of Experimental Virology, Department of Medical Microbiology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center of the University of Amsterdam, Amsterdam, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Marion Cornelissen
4Laboratory of Experimental Virology, Department of Medical Microbiology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center of the University of Amsterdam, Amsterdam, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Paul Kellam
22Kymab Ltd, Cambridge, UK
23Division of Infectious Diseases, Department of Medicine, Imperial College London, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Peter Reiss
5Stichting HIV Monitoring, Amsterdam, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Christophe Fraser
1Oxford Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, UK
2Medical Research Council Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of rapid between- and within-host evolution may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by effectively aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to preprocess reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We use shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read data produced with the Illumina platform, for 65 existing publicly available samples and 50 new samples. We show the systematic superiority of mapping to shiver’s constructed reference over mapping the same reads to the standard reference HXB2: an average of 29 bases per sample are called differently, of which 98.5% are supported by higher coverage. We also provide a practical guide to working with imperfect contigs.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
Back to top
PreviousNext
Posted December 13, 2016.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Easy and Accurate Reconstruction of Whole HIV Genomes from Short-Read Sequence Data
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Easy and Accurate Reconstruction of Whole HIV Genomes from Short-Read Sequence Data
Chris Wymant, François Blanquart, Astrid Gall, Margreet Bakker, Daniela Bezemer, Nicholas J. Croucher, Tanya Golubchik, Matthew Hall, Mariska Hillebregt, Swee Hoe Ong, Jan Albert, Norbert Bannert, Jacques Fellay, Katrien Fransen, Annabelle Gourlay, M. Kate Grabowski, Barbara Gunsenheimer-Bartmeyer, Huldrych F. Günthard, Pia Kivelä, Roger Kouyos, Oliver Laeyendecker, Kirsi Liitsola, Laurence Meyer, Kholoud Porter, Matti Ristola, Ard van Sighem, Guido Vanham, Ben Berkhout, Marion Cornelissen, Paul Kellam, Peter Reiss, Christophe Fraser, The BEEHIVE Collaboration
bioRxiv 092916; doi: https://doi.org/10.1101/092916
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Easy and Accurate Reconstruction of Whole HIV Genomes from Short-Read Sequence Data
Chris Wymant, François Blanquart, Astrid Gall, Margreet Bakker, Daniela Bezemer, Nicholas J. Croucher, Tanya Golubchik, Matthew Hall, Mariska Hillebregt, Swee Hoe Ong, Jan Albert, Norbert Bannert, Jacques Fellay, Katrien Fransen, Annabelle Gourlay, M. Kate Grabowski, Barbara Gunsenheimer-Bartmeyer, Huldrych F. Günthard, Pia Kivelä, Roger Kouyos, Oliver Laeyendecker, Kirsi Liitsola, Laurence Meyer, Kholoud Porter, Matti Ristola, Ard van Sighem, Guido Vanham, Ben Berkhout, Marion Cornelissen, Paul Kellam, Peter Reiss, Christophe Fraser, The BEEHIVE Collaboration
bioRxiv 092916; doi: https://doi.org/10.1101/092916

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4222)
  • Biochemistry (9095)
  • Bioengineering (6733)
  • Bioinformatics (23916)
  • Biophysics (12066)
  • Cancer Biology (9484)
  • Cell Biology (13720)
  • Clinical Trials (138)
  • Developmental Biology (7614)
  • Ecology (11644)
  • Epidemiology (2066)
  • Evolutionary Biology (15459)
  • Genetics (10610)
  • Genomics (14281)
  • Immunology (9448)
  • Microbiology (22749)
  • Molecular Biology (9057)
  • Neuroscience (48812)
  • Paleontology (354)
  • Pathology (1478)
  • Pharmacology and Toxicology (2558)
  • Physiology (3818)
  • Plant Biology (8300)
  • Scientific Communication and Education (1466)
  • Synthetic Biology (2285)
  • Systems Biology (6163)
  • Zoology (1296)