Abstract
Human diploid genome assembly enables identifying maternal and paternal genetic variations. Algorithms based on 10x linked-read sequencing have been developed for de novo assembly, variant calling and haplotyping. Another linked-read technology, single tube long fragment read (stLFR), has recently provided a low-cost single tube solution that can enable long fragment data. However, no existing software is available for human diploid assembly and variant calls. We develop Aquila stLFR to adapt to the key characteristics of stLFR. Aquila stLFR assembles near perfect diploid assembled contigs, and the assembly-based variant calling shows that Aquila stLFR detects large numbers of structural variants which were not easily spanned by Illumina short-reads. Furthermore, the hybrid assembly mode Aquila hybrid allows a hybrid assembly based on both stLFR and 10x linked-reads libraries, demonstrating that these two technologies can always be complementary to each other for assembly to improve contiguity and the variants detection, regardless of assembly quality of the library itself from single sequencing technology. The overlapped structural variants (SVs) from two independent sequencing data of the same individual, and the SVs from hybrid assemblies provide us a high-confidence profile to study them.
Availability Source code and documentation are available on https://github.com/maiziex/Aquila_stLFR.
Footnotes
Table 3 to update the benchmarks for snp/indel Table 4, Table S1, and Table S2 are updated through assembly-based variant calling by Aquila_stLFR flag "--all regions flag=1" The library names in the paper are updated to L1_stLFR, L2_stLFR, L1_10x, L3_10x, L1_10x+L2_stLFR and L3_10x+L2_stLFR to better clarify them. (L1_10x, L3_10x are consistent with the library name in Aquila paper: https://www.biorxiv.org/content/10.1101/660605v1)