PT - JOURNAL ARTICLE AU - Ko Ikemoto AU - Hinano Fujimoto AU - Akihiro Fujimoto TI - Localized assembly for long reads enables genome-wide analysis of repetitive regions at single-base resolution in human genomes AID - 10.1101/2022.12.02.518938 DP - 2022 Jan 01 TA - bioRxiv PG - 2022.12.02.518938 4099 - http://biorxiv.org/content/early/2022/12/03/2022.12.02.518938.short 4100 - http://biorxiv.org/content/early/2022/12/03/2022.12.02.518938.full AB - Background Long-read sequencing technologies have the potential to overcome the limitations of short reads and provide a comprehensive picture of the human genome. However, it remains hard to characterize repetitive sequences by reconstructing genomic structures at high resolution solely from long reads. Here, we developed a localized assembly method (LoMA) that constructs highly accurate consensus sequences (CSs) from long reads.Methods We first developed LoMA, by combining minimap2, MAFFT, and our algorithm, which classifies diploid haplotypes based on structural variants and constructs CSs. Using this tool, we analyzed two human samples (NA18943 and NA19240) sequenced with the Oxford Nanopore sequencer. We defined target regions in each genome based on mapping patterns and then constructed a high-quality catalog of the human insertion solely from the long-read data.Results The assessment of LoMA showed high accuracy of CSs (error rate < 0.3%) compared with raw data (error rate > 8%) and superiority to the previous study. The genome-wide analysis of NA18943 and NA19240 identified 5,516 and 6,542 insertions (ζ 100 bp) respectively. Most insertions (∼80%) were derived from the tandem repeat and transposable elements. We also detected processed pseudogenes, insertions in transposable elements, and long insertions (> 10 kbp). Further, our analysis suggested that short tandem duplications were association with gene expression and transposons.Conclusions Our analysis showed that LoMA constructs high-quality sequences from long reads with substantial errors. This study revealed the true structures of insertions with high accuracy and inferred mechanisms for the insertions. Our approach contributes to the future human genome studies. LoMA is available at our GitHub page: https://github.com/kolikem/loma.Competing Interest StatementThe authors have declared no competing interest.ONTOxford Nanopore TechnologiesPacBioPacific BiosciencesSVStructural VariantTRTandem RepeatTETransposable ElementT2TTelomere-to-TelomereCSConsensus SequenceWGSWhole-Genome SequencingTDTandem DuplicationNUMTNuclear Mitochondrial DNA sequenceTRFTandem Repeats FinderUTRUntranslated RegionSTRShort Tandem RepeatTSDTarget Site DuplicationSDStandard Deviation