Comprehensive, integrated, and phased whole-genome analysis of the primary ENCODE cell line K562
- Bo Zhou1,2,
- Steve S. Ho1,2,
- Stephanie U. Greer3,
- Xiaowei Zhu1,2,
- John M. Bell4,
- Joseph G. Arthur5,13,
- Noah Spies2,6,7,14,
- Xianglong Zhang1,2,
- Seunggyu Byeon8,
- Reenal Pattni1,2,
- Noa Ben-Efraim1,2,
- Michael S. Haney1,2,
- Rajini R. Haraksingh1,2,15,
- Giltae Song8,
- Hanlee P. Ji3,4,
- Dimitri Perrin9,
- Wing H. Wong5,10,
- Alexej Abyzov11 and
- Alexander E. Urban1,2,12
- 1Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California 94305, USA;
- 2Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA;
- 3Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, California 94305, USA;
- 4Stanford Genome Technology Center, Stanford University, Palo Alto, California 94304, USA;
- 5Department of Statistics, Stanford University, Stanford, California 94305, USA;
- 6Department of Pathology, Stanford University School of Medicine, Stanford, California 94305, USA;
- 7Genome-Scale Measurements Group, National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA;
- 8School of Computer Science and Engineering, College of Engineering, Pusan National University, Busan 46241, South Korea;
- 9Science and Engineering Faculty, Queensland University of Technology, Brisbane, QLD 4001, Australia;
- 10Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California 94305, USA;
- 11Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota 55905, USA;
- 12Tashia and John Morgridge Faculty Scholar, Stanford Child Health Research Institute, Stanford, California 94305, USA
Abstract
K562 is widely used in biomedical research. It is one of three tier-one cell lines of ENCODE and also most commonly used for large-scale CRISPR/Cas9 screens. Although its functional genomic and epigenomic characteristics have been extensively studied, its genome sequence and genomic structural features have never been comprehensively analyzed. Such information is essential for the correct interpretation and understanding of the vast troves of existing functional genomics and epigenomics data for K562. We performed and integrated deep-coverage whole-genome (short-insert), mate-pair, and linked-read sequencing as well as karyotyping and array CGH analysis to identify a wide spectrum of genome characteristics in K562: copy numbers (CN) of aneuploid chromosome segments at high-resolution, SNVs and indels (both corrected for CN in aneuploid regions), loss of heterozygosity, megabase-scale phased haplotypes often spanning entire chromosome arms, structural variants (SVs), including small and large-scale complex SVs and nonreference retrotransposon insertions. Many SVs were phased, assembled, and experimentally validated. We identified multiple allele-specific deletions and duplications within the tumor suppressor gene FHIT. Taking aneuploidy into account, we reanalyzed K562 RNA-seq and whole-genome bisulfite sequencing data for allele-specific expression and allele-specific DNA methylation. We also show examples of how deeper insights into regulatory complexity are gained by integrating genomic variant information and structural context with functional genomics and epigenomics data. Furthermore, using K562 haplotype information, we produced an allele-specific CRISPR targeting map. This comprehensive whole-genome analysis serves as a resource for future studies that utilize K562 as well as a framework for the analysis of other cancer genomes.
Footnotes
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.234948.118.
- Received January 22, 2018.
- Accepted December 28, 2018.
This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.