Skip to main content
bioRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search
New Results

NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses

Thiago Peixoto Leal, Vinicius C Furlan, Mateus Henrique Gouveia, Julia Maria Saraiva Duarte, Pablo AS Fonseca, Rafael Tou, Marilia de Oliveira Scliar, View ORCID ProfileGilderlanio Santana de Araujo, Camila Zolini, Maria Gabriela Campolina Diniz Peixoto, Maria Raquel Santos Carvalho, Maria Fernanda Lima-Costa, Robert H Gilman, Eduardo Tarazona-Santos, Maíra Ribeiro Rodrigues
doi: https://doi.org/10.1101/2021.10.21.465343
Thiago Peixoto Leal
1Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: thpeixotol@hotmail.com
Vinicius C Furlan
1Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mateus Henrique Gouveia
1Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
2Center for Research on Genomics & Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Julia Maria Saraiva Duarte
1Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Pablo AS Fonseca
1Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
3Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rafael Tou
1Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Marilia de Oliveira Scliar
4Centro de Pesquisa sobre o Genoma Humano e Células-Tronco, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brazil
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Gilderlanio Santana de Araujo
5Laboratório de Genética Humana e Médica, Programa de Pós-Graduação em Biologia Molecular, Instituto de Ciências Biológicas, Universidade Federal do Pará, Belem, Brazil
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Gilderlanio Santana de Araujo
Camila Zolini
1Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
6Beagle, Belo Horizonte, Brazil
7Mosaico Translational Genomics Initiative, Belo Horizonte, Brazil
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Maria Gabriela Campolina Diniz Peixoto
8Embrapa Gado de Leite, Embrapa, Juiz de Fora, Brazil
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Maria Raquel Santos Carvalho
1Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Maria Fernanda Lima-Costa
9Centro de Pesquisa Rene Rachou, Fundação Oswaldo Cruz, Belo Horizonte, Brazil
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Robert H Gilman
10Universidad Peruana Cayetano Heredia, Lima, Perú
11Dept of International Health, Johns Hopkins School of Public Health Baltimore, Baltimore, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Eduardo Tarazona-Santos
1Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
7Mosaico Translational Genomics Initiative, Belo Horizonte, Brazil
11Dept of International Health, Johns Hopkins School of Public Health Baltimore, Baltimore, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Maíra Ribeiro Rodrigues
1Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
12Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade de São Paulo, Brazil
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Genetic and omics analyses frequently require independent observations, which is not guaranteed in real datasets. When relatedness cannot be accounted for, solutions involve removing related individuals (or observations) and, consequently, a reduction of available data. We developed a network-based relatedness-pruning method that minimizes dataset reduction while removing unwanted relationships in a dataset. It uses node degree centrality metric to identify highly connected nodes (or individuals) and implements heuristics that approximate the minimal reduction of a dataset to allow its application to large datasets. NAToRA outperformed two popular methodologies (implemented in software PLINK and KING) by showing the best combination of effective relatedness-pruning, removing all relatives while keeping the largest possible number of individuals in all datasets tested and also, with similar or lesser reduction in genetic diversity. NAToRA is freely available, both as a standalone tool that can be easily incorporated as part of a pipeline, and as a graphical web tool that allows visualization of the relatedness networks. NAToRA also accepts a variety of relationship metrics as input, which facilitates its use. We also present a genealogies simulator software used for different tests performed in the manuscript.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

  • vinicius.cfurlan{at}gmail.com

  • mateus.gouveia{at}nih.gov

  • juliamsd{at}gmail.com

  • pablofonseca.bio{at}gmail.com

  • rafaeltoux{at}gmail.com

  • mariliascliar{at}yahoo.com.br

  • gilderlanio{at}gmail.com

  • camila.ldgh{at}gmail.com

  • gabriela.peixoto{at}embrapa.br

  • ma.raquel.carvalho{at}gmail.com

  • lima.costa{at}fiocruz.br

  • gilmanbob{at}gmail.com

  • edutars{at}icb.ufmg.br

  • maira.r.rodrigues{at}gmail.com

  • Add co-author, small changes on the text

  • http://ldgh.com.br/natora/

  • https://github.com/ldgh/NAToRA_Public

Copyright 
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted October 28, 2021.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about bioRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses
(Your Name) has forwarded a page to you from bioRxiv
(Your Name) thought you would like to see this page from the bioRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses
Thiago Peixoto Leal, Vinicius C Furlan, Mateus Henrique Gouveia, Julia Maria Saraiva Duarte, Pablo AS Fonseca, Rafael Tou, Marilia de Oliveira Scliar, Gilderlanio Santana de Araujo, Camila Zolini, Maria Gabriela Campolina Diniz Peixoto, Maria Raquel Santos Carvalho, Maria Fernanda Lima-Costa, Robert H Gilman, Eduardo Tarazona-Santos, Maíra Ribeiro Rodrigues
bioRxiv 2021.10.21.465343; doi: https://doi.org/10.1101/2021.10.21.465343
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
NAToRA, a relatedness-pruning method to minimize the loss of dataset size in genetic and omics analyses
Thiago Peixoto Leal, Vinicius C Furlan, Mateus Henrique Gouveia, Julia Maria Saraiva Duarte, Pablo AS Fonseca, Rafael Tou, Marilia de Oliveira Scliar, Gilderlanio Santana de Araujo, Camila Zolini, Maria Gabriela Campolina Diniz Peixoto, Maria Raquel Santos Carvalho, Maria Fernanda Lima-Costa, Robert H Gilman, Eduardo Tarazona-Santos, Maíra Ribeiro Rodrigues
bioRxiv 2021.10.21.465343; doi: https://doi.org/10.1101/2021.10.21.465343

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Bioinformatics
Subject Areas
All Articles
  • Animal Behavior and Cognition (4852)
  • Biochemistry (10793)
  • Bioengineering (8043)
  • Bioinformatics (27298)
  • Biophysics (13986)
  • Cancer Biology (11125)
  • Cell Biology (16056)
  • Clinical Trials (138)
  • Developmental Biology (8780)
  • Ecology (13289)
  • Epidemiology (2067)
  • Evolutionary Biology (17359)
  • Genetics (11688)
  • Genomics (15919)
  • Immunology (11032)
  • Microbiology (26078)
  • Molecular Biology (10638)
  • Neuroscience (56546)
  • Paleontology (418)
  • Pathology (1732)
  • Pharmacology and Toxicology (3004)
  • Physiology (4546)
  • Plant Biology (9629)
  • Scientific Communication and Education (1615)
  • Synthetic Biology (2686)
  • Systems Biology (6977)
  • Zoology (1509)