Multiplex padlock targeted sequencing reveals human hypermutable CpG variations
- Jin Billy Li1,6,9,
- Yuan Gao2,6,
- John Aach1,6,
- Kun Zhang3,6,
- Gregory V. Kryukov4,6,
- Bin Xie2,
- Annika Ahlford1,7,
- Jung-Ki Yoon1,8,
- Abraham M. Rosenbaum1,
- Alexander Wait Zaranek1,
- Emily LeProust5,
- Shamil R. Sunyaev4 and
- George M. Church1,9
- 1 Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA;
- 2 Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, Virginia 23284, USA;
- 3 Department of Bioengineering, University of California, San Diego, California 92093, USA;
- 4 Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA;
- 5 Genomics Solution Unit, Agilent Technologies Inc., Santa Clara, California 95051, USA
-
↵6 These authors contributed equally to this work.
Abstract
Utilizing the full power of next-generation sequencing often requires the ability to perform large-scale multiplex enrichment of many specific genomic loci in multiple samples. Several technologies have been recently developed but await substantial improvements. We report the 10,000-fold improvement of a previously developed padlock-based approach, and apply the assay to identifying genetic variations in hypermutable CpG regions across human chromosome 21. From ∼3 million reads derived from a single Illumina Genome Analyzer lane, ∼94% (∼50,500) target sites can be observed with at least one read. The uniformity of coverage was also greatly improved; up to 93% and 57% of all targets fell within a 100- and 10-fold coverage range, respectively. Alleles at >400,000 target base positions were determined across six subjects and examined for single nucleotide polymorphisms (SNPs), and the concordance with independently obtained genotypes was 98.4%–100%. We detected >500 SNPs not currently in dbSNP, 362 of which were in targeted CpG locations. Transitions in CpG sites were at least 13.7 times more abundant than non-CpG transitions. Fractions of polymorphic CpG sites are lower in CpG-rich regions and show higher correlation with human–chimpanzee divergence within CpG versus non-CpG sites. This is consistent with the hypothesis that methylation rate heterogeneity along chromosomes contributes to mutation rate variation in humans. Our success suggests that targeted CpG resequencing is an efficient way to identify common and rare genetic variations. In addition, the significantly improved padlock capture technology can be readily applied to other projects that require multiplex sample preparation.
Footnotes
-
↵9 Corresponding authors.
E-mail http://arep.med.harvard.edu/gmc/email.html; fax (617) 432-6513.
E-mail jli{at}genetics.med.harvard.edu; fax (617) 432-6513.
-
[Supplemental material is available online at http://www.genome.org. The sequence data from this study have been submitted to the NCBI Short Read Archive (http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi) under accession no. SRA007914.]
-
Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.092213.109.
-
- Received February 11, 2009.
- Accepted May 20, 2009.
- Copyright © 2009 by Cold Spring Harbor Laboratory Press