Abstract
With the advent of high-throughput sequencing (HTS), profiling immunoglobulin (IG) repertoires has become an essential part of immunological research. Advances in sequencing technology enable the IonTorrent Personal Genome Machine (PGM) to cover the full-length of IG mRNA transcripts. Nucleotide insertions and deletions (indels) are the dominant errors of the PGM sequencing platform and can critically influence IG repertoire assessments. Here, we present a PGM-tailored IG repertoire sequencing approach combining error correction through unique molecular identifier (UID) barcoding and indel detection through ImMunoGeneTics (IMGT), the most commonly used sequence alignment database for IG sequences. Using artificially falsified sequences for benchmarking, we found that IMGT efficiently detects 98% of the introduced indels through gene-segment frameshifts. Undetected indels are either located at the ends of the sequences or produce masked frameshifts with an insertion and deletion in close proximity. IMGT’s indel correction algorithm resolves up to 87% of the tested insertions, but no deletions. The complementary determining regions 3 (CDR3s) are returned 100% correct for up to 3 insertions or 3 deletions through conservative culling. We further show, that our PGM-tailored unique molecular identifiers results in highly accurate HTS datasets if combined with the presented data processing. In this regard, considering sequences with at least two copies from datasets with UID families of minimum 3 reads result in correct sequences with over 99% confidence. The protocol and sample processing strategies described in this study will help to establish benchtop-scale sequencing of IG heavy chain transcripts in the field of IG repertoire research.
Abbreviations
- CDR3
- complementary determining region 3
- HTS
- high-throughput sequencing
- IG
- immunoglobulin
- IGH
- immunoglobulin heavy chain
- IMGT
- ImMunoGeneTics
- indel
- insertions and deletions of nucleotides
- MID
- multiplex identifier
- nt
- nucleotide
- PGM
- (Ion Torrent) Personal Genome Machine
- UID
- Unique (molecular) identifier
- ssUID
- single side unique molecular identifier