Abstract
Genome wide association studies (GWAS) have provided an avenue for the association between common genetic variants and complex traits. However, using SNP as a genetic marker, GWAS has been confined to detect genetic basis traits only for within species but not for the large-scale inter-species traits. Here, we propose a practical statistical approach that is using kmer frequencies as the genetic markers to associate genetic variants with large scale inter-species traits. We applied this new approach to the trait of chromosome number in 96 mammalian proteomes, and we prioritized 130 genes including TP53 and BAD, of which 6 were candidate genes. These genes were proved to be associated with cellular reaction of DNA double-strand breaks caused by chromosome fission/fusion. Our study provides a new effective genomic strategy to perform association studies for large-scaled inter-species traits, using the chromosome number as a case. We hope this approach could provide exploration for broadly widely traits.