ABSTRACT
High-throughput genotyping facilitates the large-scale analysis of genetic diversity in population genomics and genome-wide association studies that combine the genotypic and phenotypic characterization of large collections of wild and domesticated germplasm. Genotyping by sequencing is progressively replacing traditional genotyping methods due to the lower ascertainment bias. However, genome-wide genotyping by sequencing becomes expensive in species with large genomes and a high proportion of repetitive DNA. Here we describe the use of CRISPR/Cas9 technology to deplete repetitive elements in the 3.76-Gb genome of lentil (Lens culinaris), 84% of which consists of repeats, thus concentrating the sequencing data on coding and regulatory regions (unique regions). We designed a custom set of 566,722 gRNAs, each with at least 25 recognition sites, targeting 2.9 Gbp of repeats in 500-bp insert sequencing libraries. We excluded repetitive regions overlapping annotated genes and putative regulatory elements based on ATAC-Seq data. The novel depletion method removed 40% of reads mapping to repeats, increasing those mapping to unique and functional regions by 2.6-fold. This repeat-to-unique shift in the sequencing data increased the number of genotyped bases by up to 17-fold compared to non-depleted libraries. We were also able to identify up to 18-fold more genetic variants in the unique regions and increased the genotyping accuracy by rescuing thousands of heterozygous variants that otherwise would be missed due to low coverage. The method performed similarly regardless of the multiplexing level, type of library or genotypes, including different cultivars and a closely-related species (L. orientalis). Our results confirmed that CRISPR/Cas9-driven repeat depletion focuses sequencing data on meaningful genomic regions, helping to improve high-density and genome-wide genotyping in large and repetitive genomes.
Competing Interest Statement
Authors MR and MD are partners of Genartis srl. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.