RT Journal Article SR Electronic T1 Population matched (PM) germline allelic variants of immunoglobulin (IG) loci: New pmIG database to better understand IG repertoire and selection processes in disease and vaccination JF bioRxiv FD Cold Spring Harbor Laboratory SP 2020.04.09.033530 DO 10.1101/2020.04.09.033530 A1 Indu Khatri A1 Magdalena A. Berkowska A1 Erik B. van den Akker A1 Cristina Teodosio A1 Marcel J.T. Reinders A1 Jacques J.M. van Dongen YR 2020 UL http://biorxiv.org/content/early/2020/04/10/2020.04.09.033530.abstract AB At the population level, immunoglobulin (IG) loci harbor inter-individual allelic variants in the many different germline IG variable (V), Diversity (D) and Joining (J) genes of the IG heavy (IGH), IG kappa (IGK) and IG lambda (IGL) loci, which together form the genetic basis of the highly diverse antigen-specific B-cell receptors. These inter-individual allelic variants can be shared between or be specific to human populations. The current IG databases IMGT, VBASE2 and IgPdb hold information about germline alleles, most of which are partial sequences, obtained from a mixture of human (B-cell) samples, many with sequence errors and/or acquired (non-germline) IG variations, induced by somatic hypermutation (SHM) during antigen-specific B-cell responses. We systematically identified true germline alleles (without SHM) from 26 different human populations around the world, profiled by the “1000 Genomes data”. Our resource is uniquely enriched with complete IG allele sequences and their frequencies across human populations. We identified 409 IGHV, 179 IGKV, and 199 IGLV germline alleles supported by at least seven haplotypes (= minimum of four individuals), after removal of potential false-positives, based on using other genomic databases, i.e. ENSEMBL, TopMed, ExAC, ProjectMine. Remarkably, the positions of the identified variant nucleotides of the different alleles are not at random (as observed in case of SHM), but show striking patterns, restricted to limited nucleotide positions, the same as found in other IG data bases, suggesting over-time evolutionary selection processes. The identification of these specific patterns provides extra evidence that the identified variant nucleotides are not sequencing errors, but genuine allelic variants. The diversity of germline allelic variants in IGH and IGL loci is the highest in Africans, while the IGK locus is most diverse in Europeans. We also report on the presence of recombination signal sequences (RSS) in V pseudogenes, explaining their usage in V(D)J rearrangements. We propose that this new set of genuine germline IG sequences can serve as a new population-matched IG (pmIG) database for better understanding B-cell repertoire and B-cell receptor selection processes in disease and vaccination within and between different human populations. The database in format of fasta is available via GitHub (https://github.com/InduKhatri/pmIG).Contribution to the Field Statement We present a catalogue of immunoglobulin (IG) germline-alleles of unprecedented completeness and accuracy from 26 different human populations belonging to five different large ethnicities (Source: 1000 Genomes). We identified the population distribution of several known germline alleles and identified multiple new alleles, especially in African populations, indicative of high allelic diversity of IG genes in Africa. Strikingly, the identified variant nucleotides of the different alleles are not at random, but show striking patterns, restricted to limited nucleotide positions, the same as found in other IG databases, suggesting over-time evolutionary selection processes. Furthermore, we identified recombination signal sequences in pseudogenes (previously not known). We provide an overview of IG germline alleles shared with and between known databases and also point to potential sources of non-germline variation and incompleteness of the existing IG databases. More importantly, we believe that this information can serve as a novel population-matched IG (pmIG) database, highly valuable for the research community in supporting the dissection and understanding of differences in effectiveness of antibody-based immune responses in infectious diseases, other (immune) diseases and vaccination within and between human populations. Such knowledge might be used in developing population-specific vaccination strategies e.g. for currently ongoing SARS-CoV2 pandemic.Competing Interest StatementJ. J. M. van Dongen is the founder of the EuroClonality Consortium and one of the inventors on the EuroClonality-owned patents and EuroFlow-owned patents, which are licensed to Invivoscribe, BD Biosciences or Cytognos; these companies pay royalties to the EuroClonality and EuroFlow Consortia, respectively, which are exclusively used for sustainability of these consortia. J. J. M. van Dongen reports an Educational Services Agreement with BD Biosciences and a Scientific Advisory Agreement with Cytognos to LUMC. The rest of the authors declare that they have no other relevant conflicts of interest.