Abstract
Background Structural Variations (SVs) are very diverse genomic rearrangements. In the past, their detection was restricted to cytological approaches, then to NGS read size and partitionned assemblies. Due to the current capabilities of technologies such as long read sequencing and optical mapping, larger SVs detection are becoming more and more accessible.
This study proposes a comparison in SVs detection and characterization from long-read sequencing obtained with the MinION device developed by Oxford Nanopore Technologies and from optical mapping produced by the Saphyr device commercialized by Bionano Genomics. The genomes of the two Arabidopsis thaliana ecotypes Columbia-0 (Col-0) and Landsberg erecta 1 (Ler-1) were chosen to guide the use of one or the other technology.
Results We described the SVs detected from the alignment of the best ONT assembly and DLE-1 optical maps of A. thaliana Ler-1 on the public reference Col-0 TAIR10.1. After filtering, 1 184 and 591 Ler-1 SVs were retained from ONT and BioNano technologies respectively. A total of 948 Ler-1 ONT SVs (80.1%) corresponded to 563 Bionano SVs (95.3%) leading to 563 common locations in both technologies. The specific locations were scrutinized to assess improvement in SV detection by either technology. The ONT SVs were mostly detected near TE and gene features, and resistance genes seemed particularly impacted.
Conclusions Structural variations linked to ONT sequencing error were removed and false positives limited, with high quality Bionano SVs being conserved. When compared with the Col-0 TAIR10.1 reference, most of detected SVs were found in same locations. ONT assembly sequence leads to more specific SVs than Bionano one, the later being more efficient to characterize large SVs. Even if both technologies are obvious complementary approaches, ONT data appears to be more adapted to large scale populations study, while Bionano performs better in improving assembly and describing specificity of a genome compared to a reference.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
aurelie.canaguier{at}inrae.fr, romane.guilbaud{at}inrae.fr, erwandenis{at}hotmail.com, gmagdele{at}genoscope.cns.fr, cbelser{at}genoscope.cns.fr, bistace{at}genoscope.cns.fr, cruaud{at}genoscope.cns.fr, marie-christine.le-paslier{at}inrae.fr, pwincker{at}genoscope.cns.fr, vbarbe{at}genoscope.cns.fr
List of abbreviations
- bp
- base pairs
- BRK
- Break
- CGH
- Comparative Genomic Hybridization
- CNV
- copy number variations
- Col-0
- Arabidopsis thaliana ecotypes Columbia-0
- DEL
- Deletion
- DLE-1
- Direct Label Enzyme – 1
- DLS
- Direct Label and Stain
- DNA
- Desoxyribo Nucleic Acid
- DUP
- Duplication
- Gb
- Gigabases
- Hi-C
- HIgh-throughput chromatin conformation Capture
- Indels insertions/deletions
- INS
- Insertion
- INV
- Inversion
- JMP
- Jump
- Kb
- kilobases
- Ler-1
- Arabidopsis thaliana ecotypes Landsberg erecta 1
- LER
- Arabdopsis thaliana Ler-1 reference genome published by Zapata et al. 2016
- NA
- Not Available
- NGS
- Next Generation Sequence
- ONT
- Oxford Nanopore Technologies
- PAV
- presence/absence variations
- RA
- Rapid Assembler
- SDN
- SMARTdenovo
- SEQ
- Sequence
- SNP
- Single Nucleotid Polymorphism
- SV
- Structural Variation
- TAIR10.1
- last version of Arabdopsis thaliana Col-0 reference genome availbale at the The Arabidopsis Information Resource repository (TAIR)
- TE
- Transposable Element
- TRA
- Translocation