TY - JOUR T1 - A BALB/c IGHV Reference Set, defined by haplotype analysis of long-read VDJ-C sequences from F1 (BALB/c / C57BL/6) mice JF - bioRxiv DO - 10.1101/2022.02.28.482396 SP - 2022.02.28.482396 AU - Katherine JL Jackson AU - Justin T Kos AU - William Lees AU - William S Gibson AU - Melissa Laird Smith AU - Ayelet Peres AU - Gur Yaari AU - Martin Corcoran AU - Christian E. Busse AU - Mats Ohlin AU - Corey T Watson AU - Andrew M Collins Y1 - 2022/01/01 UR - http://biorxiv.org/content/early/2022/03/01/2022.02.28.482396.abstract N2 - The immunoglobulin genes of inbred mouse strains that are commonly used in models of antibody-mediated human diseases are poorly characterized. This compromises data analysis. To infer the immunoglobulin genes of BALB/c mice, we used long-read SMRT sequencing to amplify VDJ-C sequences from F1 (BALB/c x C57BL/6) hybrid animals. Previously unreported strain variations were identified in the Ighm and Ighg2b genes, and analysis of VDJ rearrangements led to the inference of 278 germline IGHV alleles. 169 alleles are not present in the C57BL/6 genome reference sequence. To establish a set of expressed BALB/c IGHV germline gene sequences, we computationally retrieved IGHV haplotypes from the IgM dataset. Haplotyping led to the confirmation of 162 BALB/c IGHV gene sequences. A musIGHV398 pseudogene variant also appears to be present in the BALB/cByJ substrain, while a functional musIGHV398 gene is highly expressed in the BALB/cJ substrain. Only four of the BALB/c alleles were also observed in the C57BL/6 haplotype. The full set of inferred BALB/c sequences has been used to establish a BALB/c IGHV reference set, hosted at https://ogrdb.airr-community.org. We assessed whether assemblies from the Mouse Genome Project (MGP) are suitable for the determination of the genes of the IGH loci. Only 37 (43.5%) of the 85 confirmed IMGT-named BALB/c IGHV and 33 (42.9%) of the 77 confirmed non-IMGT IGHV were found in a search of the MGP BALB/cJ genome assembly. This suggests that Adaptive Immune Receptor Repertoire sequencing (AIRR-Seq) data, but not currently-available genome assemblies, are suited to the documentation of germline IGHV genes.Competing Interest StatementThe authors have declared no competing interest. ER -