PT - JOURNAL ARTICLE AU - Dylan Lawless AU - Hana Allen Lango AU - James Thaventhiran AU - Jolan E. Walter AU - Rashida Anwar AU - Sinisa Savic TI - Characterising <em>RAG1</em> and <em>RAG2</em> with predictive genomics AID - 10.1101/272609 DP - 2018 Jan 01 TA - bioRxiv PG - 272609 4099 - http://biorxiv.org/content/early/2018/08/02/272609.short 4100 - http://biorxiv.org/content/early/2018/08/02/272609.full AB - While widespread genome sequencing ushers in a new era of preventive medicine, the tools for predictive genomics are still lacking. Time and resource limitations mean that human diseases remain uncharacterised because of an inability to predict clinically relevant genetic variants. The structural or functional impact of a coding variant is mirrored by allele frequencies amongst the general population. Studies in protein function frequently target sites that are evolutionarily preserved. However, rare diseases are often attributable to variants in genes that are highly conserved. An immunological disorder exemplifying this challenge occurs through damaging mutations in RAG1 and RAG2. RAG deficiency presents at an early age with a distinct phenotype of life-threatening immunodeficiency or autoimmunity. Many tools exist for variant pathogenicity prediction but these cannot account for the probability of variant occurrence. We present variants in RAG1 and RAG2 proteins which are most likely to be seen clinically as disease-causing. Our method of mutation rate residue frequency builds a map of most probable mutations allowing preemptive functional analysis. We compare the accuracy of our predicted probabilities to functional measurements and provide the method for application to any monogenic disorder.Funding This work is funded by the University of Leeds 110 Anniversary Research Scholarship and by the National Institute for Health Research (NIHR, grant number RG65966). Dr. Jolan Walter has received federal funding. The authors declare no conflict of interest.Acknowledgements We gratefully acknowledge the participation of all NIHR BioResource volunteers, and thank the NIHR BioResource centres and staff for their contribution. We thank the National Institute for Health Research and NHS Blood and Transplant. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.Ethics statement The study was performed in accordance with the Declaration of Helsinki. The NIHR BioResource projects were approved by Research Ethics Committees in the UK and appropriate national ethics authorities in non-UK enrolment centres.AbbreviationsBCR(B-cell receptor)CADD(combined annotation dependent depletion)CID-G/A(combined immunodeficiency with granuloma and/or autoimmunity)GWAS(genome-wide association studies)Mr(mutation rate)MRF(mutation rate residue frequency)PID(primary immunodeficiency)pLI(probability of being loss-of-function intolerant)RAG1(recombination activating gene 1)Rf (residue frequency)rf-igf (residue frequency, inverse gene frequency)RNH(RNase H)RSS(recombination signal sequence)SCID(severe combined immunodeficiency)TCR (T-cell receptor)tf -idf (term frequency, inverse document frequency).