RT Journal Article SR Electronic T1 Localized assembly for long reads enables genome-wide analysis of repetitive regions at single-base resolution in human genomes JF bioRxiv FD Cold Spring Harbor Laboratory SP 2022.12.02.518938 DO 10.1101/2022.12.02.518938 A1 Ko Ikemoto A1 Hinano Fujimoto A1 Akihiro Fujimoto YR 2022 UL http://biorxiv.org/content/early/2022/12/03/2022.12.02.518938.abstract AB Background Long-read sequencing technologies have the potential to overcome the limitations of short reads and provide a comprehensive picture of the human genome. However, it remains hard to characterize repetitive sequences by reconstructing genomic structures at high resolution solely from long reads. Here, we developed a localized assembly method (LoMA) that constructs highly accurate consensus sequences (CSs) from long reads.Methods We first developed LoMA, by combining minimap2, MAFFT, and our algorithm, which classifies diploid haplotypes based on structural variants and constructs CSs. Using this tool, we analyzed two human samples (NA18943 and NA19240) sequenced with the Oxford Nanopore sequencer. We defined target regions in each genome based on mapping patterns and then constructed a high-quality catalog of the human insertion solely from the long-read data.Results The assessment of LoMA showed high accuracy of CSs (error rate < 0.3%) compared with raw data (error rate > 8%) and superiority to the previous study. The genome-wide analysis of NA18943 and NA19240 identified 5,516 and 6,542 insertions (ζ 100 bp) respectively. Most insertions (∼80%) were derived from the tandem repeat and transposable elements. We also detected processed pseudogenes, insertions in transposable elements, and long insertions (> 10 kbp). Further, our analysis suggested that short tandem duplications were association with gene expression and transposons.Conclusions Our analysis showed that LoMA constructs high-quality sequences from long reads with substantial errors. This study revealed the true structures of insertions with high accuracy and inferred mechanisms for the insertions. Our approach contributes to the future human genome studies. LoMA is available at our GitHub page: https://github.com/kolikem/loma.Competing Interest StatementThe authors have declared no competing interest.ONTOxford Nanopore TechnologiesPacBioPacific BiosciencesSVStructural VariantTRTandem RepeatTETransposable ElementT2TTelomere-to-TelomereCSConsensus SequenceWGSWhole-Genome SequencingTDTandem DuplicationNUMTNuclear Mitochondrial DNA sequenceTRFTandem Repeats FinderUTRUntranslated RegionSTRShort Tandem RepeatTSDTarget Site DuplicationSDStandard Deviation