Abstract
Single-cell sequencing experiments use short DNA barcode ‘tags’ to identify reads that originate from the same cell. In order to recover single-cell information from such experiments, reads must be grouped based on their barcode tag, a crucial processing step that precedes other computations. However, this step can be difficult due to high rates of mismatch and deletion errors that can afflict barcodes. Here we present an approach to identify and error-correct barcodes by traversing the de Bruijn graph of circularized barcode k-mers. This allows for assignment of reads to consensus fingerprints constructed from k-mers, and we show that for single-cell RNA-Seq this improves the recovery of accurate single-cell transcriptome estimates.
Copyright
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.