Abstract
DeepKin is a novel tool designed to predict relatedness from genomic data using convolutional neural networks (CNNs). Traditional methods for estimating relatedness often struggle when genomic data is limited, as with paleogenomes and degraded forensic samples. DeepKin addresses this challenge by leveraging two CNN models trained on simulated genomic data to classify relatedness up to the third-degree and to identify parent-offspring and sibling pairs. Our benchmarking shows DeepKin performs comparably or better than the widely used tool READv2. We validated DeepKin on empirical paleogenomes from two paleological sites, demonstrating its robustness and adaptability across different genetic backgrounds, with accuracy >90% above 10K shared SNPs. By capturing information across genomic segments, DeepKin offers a new methodological path for relatedness estimation in settings with highly degraded samples, with applications in ancient DNA, as well as forensic and conservation genetics.
Competing Interest Statement
The authors have declared no competing interest.