Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
Confirmatory Results

Object-Attribute Biclustering for Elimination of Missing Genotypes in Ischemic Stroke Genome-Wide Data

View ORCID ProfileDmitry I. Ignatov, View ORCID ProfileGennady V. Khvorykh, View ORCID ProfileAndrey V. Khrunin, View ORCID ProfileStefan Nikolić, View ORCID ProfileMakhmud Shaban, Elizaveta A. Petrova, Evgeniya A. Koltsova, View ORCID ProfileFouzi Takelait, View ORCID ProfileDmitrii Egurnov
doi: https://doi.org/10.1101/2020.10.22.349910
Dmitry I. Ignatov
1National Research University Higher School of Economics, Russia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Dmitry I. Ignatov
  • For correspondence: dmitrii.ignatov@gmail.com
Gennady V. Khvorykh
2Institute of Molecular Genetics of National Research Centre “Kurchatov Institute”, Russia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Gennady V. Khvorykh
Andrey V. Khrunin
2Institute of Molecular Genetics of National Research Centre “Kurchatov Institute”, Russia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Andrey V. Khrunin
Stefan Nikolić
1National Research University Higher School of Economics, Russia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Stefan Nikolić
Makhmud Shaban
1National Research University Higher School of Economics, Russia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Makhmud Shaban
Elizaveta A. Petrova
3Pirogov Russian National Research Medical University, Russia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Evgeniya A. Koltsova
3Pirogov Russian National Research Medical University, Russia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Fouzi Takelait
1National Research University Higher School of Economics, Russia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Fouzi Takelait
Dmitrii Egurnov
1National Research University Higher School of Economics, Russia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Dmitrii Egurnov
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Missing genotypes can affect the efficacy of machine learning approaches to identify the risk genetic variants of common diseases and traits. The problem occurs when genotypic data are collected from different experiments with different DNA microarrays, each being characterised by its pattern of uncalled (missing) genotypes. This can prevent the machine learning classifier from assigning the classes correctly. To tackle this issue, we used well-developed notions of object-attribute biclusters and formal concepts that correspond to dense subrelations in the binary relation patients × SNPs. The paper contains experimental results on applying a biclustering algorithm to a large real-world dataset collected for studying the genetic bases of ischemic stroke. The algorithm could identify large dense biclusters in the genotypic matrix for further processing, which in return significantly improved the quality of machine learning classifiers. The proposed algorithm was also able to generate biclusters for the whole dataset without size constraints in comparison to the In-Close4 algorithm for generation of formal concepts.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • dignatov{at}hse.ru, http://www.hse.ru

  • khvorykh{at}img.ras.ru, http://img.ras.ru

  • http://rsmu.ru

  • The corrections are made for Property 5 in Proposition 2. For every $(g,m)\in I$, $(h,n) \in [g]_M \times [m]_G$\footnote{The equivalence classes are $[g]_M=\{h \mid h \in G, g'=h'\}$ and $[m]_G=\{n \mid n \in M, n'=m'\}$.}, it follows $(m',g')=(n',h')$. And missing references on the coinage of bicluster term ([5]) and related Boolean techniques are added.

  • https://github.com/dimachine/OABicGWAS

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted October 25, 2020.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Object-Attribute Biclustering for Elimination of Missing Genotypes in Ischemic Stroke Genome-Wide Data
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Object-Attribute Biclustering for Elimination of Missing Genotypes in Ischemic Stroke Genome-Wide Data
Dmitry I. Ignatov, Gennady V. Khvorykh, Andrey V. Khrunin, Stefan Nikolić, Makhmud Shaban, Elizaveta A. Petrova, Evgeniya A. Koltsova, Fouzi Takelait, Dmitrii Egurnov
bioRxiv 2020.10.22.349910; doi: https://doi.org/10.1101/2020.10.22.349910
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Object-Attribute Biclustering for Elimination of Missing Genotypes in Ischemic Stroke Genome-Wide Data
Dmitry I. Ignatov, Gennady V. Khvorykh, Andrey V. Khrunin, Stefan Nikolić, Makhmud Shaban, Elizaveta A. Petrova, Evgeniya A. Koltsova, Fouzi Takelait, Dmitrii Egurnov
bioRxiv 2020.10.22.349910; doi: https://doi.org/10.1101/2020.10.22.349910

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (3513)
  • Biochemistry (7358)
  • Bioengineering (5334)
  • Bioinformatics (20290)
  • Biophysics (10032)
  • Cancer Biology (7753)
  • Cell Biology (11323)
  • Clinical Trials (138)
  • Developmental Biology (6442)
  • Ecology (9962)
  • Epidemiology (2065)
  • Evolutionary Biology (13340)
  • Genetics (9363)
  • Genomics (12594)
  • Immunology (7717)
  • Microbiology (19055)
  • Molecular Biology (7452)
  • Neuroscience (41085)
  • Paleontology (300)
  • Pathology (1232)
  • Pharmacology and Toxicology (2140)
  • Physiology (3169)
  • Plant Biology (6867)
  • Scientific Communication and Education (1275)
  • Synthetic Biology (1899)
  • Systems Biology (5320)
  • Zoology (1089)