Abstract
With long-read sequencing, we have entered an era where individual genomes are routinely assembled to near completion and where complex genetic variation can efficiently be resolved. Here, we demonstrate that long reads can be applied to study the genomic architecture of individual human cells. Clonally expanded CD8+ T-cells from a human donor were used as starting material for a droplet-based multiple displacement amplification (dMDA) to generate long molecules with minimal amplification bias. PacBio HiFi sequencing generated up to 20 Gb data and 40% genome coverage per single cell. The data allowed for accurate detection and haplotype phasing of single nucleotide variants (SNVs), structural variants (SVs), and tandem repeats, including in genomic regions inaccessible by short reads. Somatic SNVs were detected in the nuclear genome and mitochondrial DNA. An average of 1278 high-confidence SVs per cell were discovered in the PacBio data, nearly four times as many compared to those found in Illumina dMDA data from clonally related cells. Single-cell de novo assembly resulted in a genome size of up to 598 Mb and 1762 (12.8%) complete gene models. In summary, the work presented here demonstrates the utility of whole genome amplification combined with long-read sequencing toward the characterization of the full spectrum of genetic variation at the single-cell level.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Long-read data has been added from additional single cells. Also, a bulk DNA sample from the same individual has been sequenced to high coverage with PacBio HiFi reads.