PT - JOURNAL ARTICLE AU - Samarth Rangavittal AU - Robert S. Harris AU - Monika Cechova AU - Marta Tomaszkiewicz AU - Rayan Chikhi AU - Kateryna D. Makova AU - Paul Medvedev TI - RecoverY: K-mer based read classification for Y-chromosome specific sequencing and assembly AID - 10.1101/148114 DP - 2017 Jan 01 TA - bioRxiv PG - 148114 4099 - http://biorxiv.org/content/early/2017/06/14/148114.short 4100 - http://biorxiv.org/content/early/2017/06/14/148114.full AB - Motivation The haploid mammalian Y chromosome is usually under-represented in genome assemblies due to high repeat content and low depth due to its haploid nature. One strategy to ameliorate the low coverage of Y sequences is to experimentally enrich Y-specific material before assembly. Since the enrichment process is imperfect, algorithms are needed to identify putative Y-specific reads prior to downstream assembly. A strategy that uses k-mer abundances to identify such reads was used to assemble the gorilla Y (Tomaszkiewicz et al 2016). However, the strategy required the manual setting of key parameters, a time-consuming process leading to suboptimal assemblies.Results We develop a method, RecoverY, that builds on the previous strategy by automatically selecting the abundance level at which a k-mer is deemed to originate from the Y. This algorithm uses prior knowledge about the Y chromosome of a related species or known Y transcript sequences. We evaluate RecoverY on both simulated and real data, for human and gorilla, and investigate its robustness to important parameters. We show that RecoverY leads to a vastly superior assembly compared to alternate strategies of filtering the reads or contigs. Compared to the preliminary strategy used in Tomaszkiewicz et al (2016), we achieve a 33% improvement in assembly size and a 20% improvement in the NG50, demonstrating the power of automatic parameter selection.Availability Our tool RecoverY is freely available at https://github.com/makovalab-psu/RecoverYContact kmakova{at}bx.psu.edu, pashadag{at}cse.psu.eduSupplementary information Attached as an additional file.