ABSTRACT
We developed a targeted sequencing method for intact high molecular weight (HMW) DNA targets as large as 0.2 Mb. This process uses HMW DNA isolated from intact cells, custom designed Cas9-guide RNA complexes to generate 0.1 – 0.2 Mb DNA targets, electrophoretic isolation of the DNA targets and sequencing with barcode linked reads. We used alignment methods as well as local assembly of the target regions to identify haplotypes and structural variants (SVs) across multi-Megabase genomic regions. To demonstrate the performance of this approach, we designed three assays that covered a 0.2 Mb region surrounding the BRCA1 gene, a set of 40 overlapping 0.2 Mb targets covering the entire 4-Mb MHC locus, and 18 well-characterized structural variants. Using the highly characterized NA12878 genome, we achieved on-target coverage of more than 50X, while overall whole genome coverage was approximately 4X. We generated haplotypes that completely covered each targeted locus, with a maximum size of 4 Mb (for the MHC region). This method detected structural variants such as deletions and inversions with determination of the exact breakpoints and genotypes. Even breakpoints inside highly homologous segmental duplications are precisely determined with our high-quality assemblies. Overall, this is a new method to sequence large DNA segments.