Abstract
Copy number variants (CNVs) are pivotal in driving phenotypic variation that facilitates species adaptation. They are significant contributors to various disorders, making ancient genomes crucial for uncovering the genetic origins of disease susceptibility across populations. However, detecting CNVs in ancient DNA samples poses substantial challenges due to several factors. Ancient DNA (aDNA) is often highly degraded, and this degradation is further complicated by contamination from microbial DNA and DNA from closely related species, introducing additional noise into sequencing data. Finally, the typically low coverage of aDNA renders accurate CNV detection particularly difficult. Conventional CNV calling algorithms, optimized for high coverage and long reads, often underperform in such conditions. To address these limitations, we introduce LYCEUM, a deep learning-based CNV caller specifically designed for low-coverage aDNA. LYCEUM performs transfer learning from a model designed to detect CNVs in another noisy data domain, whole exome sequencing then it performs fine-tuning with a few aDNA samples for which semi-ground truth CNV calls are available. Our findings demonstrate that LYCEUM accurately identifies CNVs even in highly downsampled genomes, maintaining robust performance across a range of coverage levels. Thus, LYCEUM offers researchers a reliable solution for CNV detection in challenging ancient genomic datasets. LYCEUM is available at https://github.com/ciceklab/LYCEUM
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Middle name of the first author is added.