Abstract
Summary: De novo genome assembly is an important application on both uncharacterized genome assembly and variant identification in a reference-unbiased way. In comparison with de Brujin graph, string graph is a lossless data representation for de novo assembly. However, string graph construction is computational intensive. We propose GraphSeq to accelerate string graph construction by leveraging the distributed computing framework.
Availability and Implementation: GraphSeq is implemented with Scala on Spark and freely available at https://www.atgenomix.com/blog/graphseq.
Supplementary information: Supplementary data are available at Bioinformatics online.
Copyright
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.