Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

SNVformer: An Attention-based Deep Neural Network for GWAS Data

View ORCID ProfileKieran Elmes, View ORCID ProfileDiana Benavides-Prado, View ORCID ProfileNeşet Özkan Tan, View ORCID ProfileTrung Bao Nguyen, View ORCID ProfileNicholas Sumpter, View ORCID ProfileMegan Leask, View ORCID ProfileMichael Witbrock, View ORCID ProfileAlex Gavryushkin
doi: https://doi.org/10.1101/2022.07.07.499217
Kieran Elmes
1Biological Data Science Lab, School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
2Department of Computer Science, University of Otago, Dunedin, New Zealand
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kieran Elmes
Diana Benavides-Prado
3Strong AI Lab, School of Computer Science, The University of Auckland, Auckland, New Zealand
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Diana Benavides-Prado
Neşet Özkan Tan
3Strong AI Lab, School of Computer Science, The University of Auckland, Auckland, New Zealand
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Neşet Özkan Tan
Trung Bao Nguyen
3Strong AI Lab, School of Computer Science, The University of Auckland, Auckland, New Zealand
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Trung Bao Nguyen
Nicholas Sumpter
4University of Alabama at Birmingham, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nicholas Sumpter
Megan Leask
4University of Alabama at Birmingham, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Megan Leask
Michael Witbrock
3Strong AI Lab, School of Computer Science, The University of Auckland, Auckland, New Zealand
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Michael Witbrock
Alex Gavryushkin
1Biological Data Science Lab, School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Alex Gavryushkin
  • For correspondence: alex@biods.org
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Preview PDF
Loading

Abstract

Despite being the widely-used gold standard for linking common genetic variations to phenotypes and disease, genome-wide association studies (GWAS) suffer major limitations, partially attributable to the reliance on simple, typically linear, models of genetic effects. More elaborate methods, such as epistasis-aware models, typically struggle with the scale of GWAS data. In this paper, we build on recent advances in neural networks employing Transformer-based architectures to enable such models at a large scale. As a first step towards replacing linear GWAS with a more expressive approximation, we demonstrate prediction of gout, a painful form of inflammatory arthritis arising when monosodium urate crystals form in the joints under high serum urate conditions, from Single Nucleotide Variants (SNVs) using a scalable (long input) variant of the Transformer architecture. Furthermore, we show that sparse SNVs can be efficiently used by these Transformer-based networks without expanding them to a full genome. By appropriately encoding SNVs, we are able to achieve competitive initial performance, with an AUROC of 83% when classifying a balanced test set using genotype and demographic information. Moreover, the confidence with which the network makes its prediction is a good indication of the prediction accuracy. Our results indicate a number of opportunities for extension, enabling full genome-scale data analysis using more complex and accurate genotype-phenotype association models.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • The author list was incorrectly imported from the .pdf in the previous version. Two missing authors have been added now.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted July 11, 2022.
Download PDF
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
SNVformer: An Attention-based Deep Neural Network for GWAS Data
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
SNVformer: An Attention-based Deep Neural Network for GWAS Data
Kieran Elmes, Diana Benavides-Prado, Neşet Özkan Tan, Trung Bao Nguyen, Nicholas Sumpter, Megan Leask, Michael Witbrock, Alex Gavryushkin
bioRxiv 2022.07.07.499217; doi: https://doi.org/10.1101/2022.07.07.499217
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
SNVformer: An Attention-based Deep Neural Network for GWAS Data
Kieran Elmes, Diana Benavides-Prado, Neşet Özkan Tan, Trung Bao Nguyen, Nicholas Sumpter, Megan Leask, Michael Witbrock, Alex Gavryushkin
bioRxiv 2022.07.07.499217; doi: https://doi.org/10.1101/2022.07.07.499217

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4655)
  • Biochemistry (10307)
  • Bioengineering (7618)
  • Bioinformatics (26198)
  • Biophysics (13453)
  • Cancer Biology (10623)
  • Cell Biology (15348)
  • Clinical Trials (138)
  • Developmental Biology (8453)
  • Ecology (12760)
  • Epidemiology (2067)
  • Evolutionary Biology (16772)
  • Genetics (11361)
  • Genomics (15405)
  • Immunology (10554)
  • Microbiology (25059)
  • Molecular Biology (10162)
  • Neuroscience (54123)
  • Paleontology (398)
  • Pathology (1655)
  • Pharmacology and Toxicology (2877)
  • Physiology (4314)
  • Plant Biology (9204)
  • Scientific Communication and Education (1582)
  • Synthetic Biology (2543)
  • Systems Biology (6753)
  • Zoology (1453)