Abstract
Genetic and omics analyses frequently require independent observations, which is not guaranteed in real datasets. When relatedness cannot be accounted for, solutions involve removing related individuals (or observations) and, consequently, a reduction of available data. We developed a network-based relatedness-pruning method that minimizes dataset reduction while removing unwanted relationships in a dataset. It uses node degree centrality metric to identify highly connected nodes (or individuals) and implements heuristics that approximate the minimal reduction of a dataset to allow its application to large datasets. NAToRA outperformed two popular methodologies (implemented in software PLINK and KING) by showing the best combination of effective relatedness-pruning, removing all relatives while keeping the largest possible number of individuals in all datasets tested and also, with similar or lesser reduction in genetic diversity. NAToRA is freely available, both as a standalone tool that can be easily incorporated as part of a pipeline, and as a graphical web tool that allows visualization of the relatedness networks. NAToRA also accepts a variety of relationship metrics as input, which facilitates its use. We also present a genealogies simulator software used for different tests performed in the manuscript.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
vinicius.cfurlan{at}gmail.com
mateus.gouveia{at}nih.gov
juliamsd{at}gmail.com
pablofonseca.bio{at}gmail.com
rafaeltoux{at}gmail.com
mariliascliar{at}yahoo.com.br
gilderlanio{at}gmail.com
camila.ldgh{at}gmail.com
gabriela.peixoto{at}embrapa.br
ma.raquel.carvalho{at}gmail.com
lima.costa{at}fiocruz.br
gilmanbob{at}gmail.com
edutars{at}icb.ufmg.br
maira.r.rodrigues{at}gmail.com
Add co-author, small changes on the text